The ML k-nearest Neighbor (kNN) algorithm)

Tell me if it is possible and how you can add a condition so that the prediction of the label knn.predict(x_test) occurs only if the 2 nearest neighbors n_neighbors=2 have the same labels.

Example, we predict the label 1 or 0. When searching for 2 nearest neighbors, if they both have the label 1, then knn.predict(x_test) будет равно 1, if both are 0, then it will be 0. But if one neighbor is 0 and the other is 1, then the label prediction does not occur.

Code used:

knn = KNeighborsClassifier(n_neighbors=2, n_jobs=-1, weights='distance').fit(X, Y)
y_knn = knn.predict(x_test)
AA = accuracy_score(y_test, y_knn)
print(y_knn)
print(AA)

I also use cross model all but one:

knn = KNeighborsClassifier(n_neighbors=2, n_jobs=-1, weights='distance').fit(X, Y)
scores = cross_val_score(knn, X, Y, cv=LeaveOneOut())
print(scores.mean())

Or is there another method more suitable for this purpose? Thanks!

Author: insolor, 2020-03-16

1 answers

If you use KNeighborsClassifier from the package sklearn, then you can use the predict_proba function for prediction and use predictions only if the output for one of the classes is exactly 1, and if all classes have a probability less than 1, then discard such predictions. Here is a sample code on the generated data:

from sklearn.datasets import make_classification
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np

X, y = make_classification(n_features=20, n_redundant=0, n_informative=10,
                           random_state=1, n_clusters_per_class=2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=42)

knn = KNeighborsClassifier(n_neighbors=2, n_jobs=-1, weights='distance').fit(X_train, y_train)
y_knn = knn.predict(X_test)
print('все предсказания', y_knn)
y_knn_filt = np.max(knn.predict_proba(X_test), axis=1) == 1
print('фильтр уверенных предсказаний', y_knn_filt)
print('только уверенные предсказания', np.array(y_knn)[y_knn_filt])
AA = accuracy_score(y_test, y_knn)
print('score по всем предсказаниям', AA)
AA_filt = accuracy_score(np.array(y_test)[y_knn_filt], np.array(y_knn)[y_knn_filt])
print('score по уверенным предсказаниям', AA_filt)

Result:

все предсказания [0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 1
 1 0 1]
фильтр уверенных предсказаний [ True  True  True False False  True  True  True  True False  True  True
  True  True  True False  True  True  True  True  True  True False False
  True  True  True  True False False  True  True  True  True  True  True
  True  True  True  True]
только уверенные предсказания [0 1 1 1 1 0 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1]
score по всем предсказаниям 0.925
score по уверенным предсказаниям 1.0
 3
Author: CrazyElf, 2020-03-16 09:55:09