The ML k-nearest Neighbor (kNN) algorithm)
Tell me if it is possible and how you can add a condition so that the prediction of the label knn.predict(x_test)
occurs only if the 2 nearest neighbors n_neighbors=2
have the same labels.
Example, we predict the label 1 or 0. When searching for 2 nearest neighbors, if they both have the label 1, then knn.predict(x_test) будет равно 1
, if both are 0, then it will be 0. But if one neighbor is 0 and the other is 1, then the label prediction does not occur.
Code used:
knn = KNeighborsClassifier(n_neighbors=2, n_jobs=-1, weights='distance').fit(X, Y)
y_knn = knn.predict(x_test)
AA = accuracy_score(y_test, y_knn)
print(y_knn)
print(AA)
I also use cross model all but one:
knn = KNeighborsClassifier(n_neighbors=2, n_jobs=-1, weights='distance').fit(X, Y)
scores = cross_val_score(knn, X, Y, cv=LeaveOneOut())
print(scores.mean())
Or is there another method more suitable for this purpose? Thanks!
1 answers
If you use KNeighborsClassifier
from the package sklearn
, then you can use the predict_proba
function for prediction and use predictions only if the output for one of the classes is exactly 1, and if all classes have a probability less than 1, then discard such predictions.
Here is a sample code on the generated data:
from sklearn.datasets import make_classification
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
X, y = make_classification(n_features=20, n_redundant=0, n_informative=10,
random_state=1, n_clusters_per_class=2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=42)
knn = KNeighborsClassifier(n_neighbors=2, n_jobs=-1, weights='distance').fit(X_train, y_train)
y_knn = knn.predict(X_test)
print('все предсказания', y_knn)
y_knn_filt = np.max(knn.predict_proba(X_test), axis=1) == 1
print('фильтр уверенных предсказаний', y_knn_filt)
print('только уверенные предсказания', np.array(y_knn)[y_knn_filt])
AA = accuracy_score(y_test, y_knn)
print('score по всем предсказаниям', AA)
AA_filt = accuracy_score(np.array(y_test)[y_knn_filt], np.array(y_knn)[y_knn_filt])
print('score по уверенным предсказаниям', AA_filt)
Result:
все предсказания [0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 1
1 0 1]
фильтр уверенных предсказаний [ True True True False False True True True True False True True
True True True False True True True True True True False False
True True True True False False True True True True True True
True True True True]
только уверенные предсказания [0 1 1 1 1 0 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1]
score по всем предсказаниям 0.925
score по уверенным предсказаниям 1.0