文章目录
数据准备阶段
import matplotlib.pyplot as plt
import numpy as np
# 样本特征
data_X = [
[0.5, 2],
[1.8, 3],
[3.9, 1],
[4.7, 4],
[6.2, 6],
[7.5, 5],
[8.3, 3.5],
[9.1, 7],
[9.8, 4.5]
]
# 样本标记
data_y = [0, 0, 0, 1, 1, 1, 1, 1, 1]
X_train = np.array(data_X)
y_train = np.array(data_y)
X_train
y_train
选出样本标记为0的样本特征
y_train == 0
X_train[y_train==0]
X_train[y_train==0, 0]
X_train[y_train==0, 1]
X_train[y_train==1, 0].shape
X_train[y_train==1, 1].shape
plt.scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], color='red', marker='x')
plt.scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], color='black', marker='o')
plt.show()
增加新的样本点
data_new = np.array([4, 5])
plt.scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], color='red', marker='x')
plt.scatter(X_train[y_train==1, 0], X_train[y_train==1, 1],color='black', marker='o')
plt.scatter(data_new[0], data_new[1], color='b', marker='^')
plt.show()
KNN预测的过程
1.计算新样本与已知样本点的距离
for data in X_train:
print(np.sqrt(np.sum((data - data_new) ** 2)))
distances = [np.sqrt(np.sum((data - data_new) ** 2)) for data in X_train]
distances
2.按照举例排序
np.sort(distances)
sort_index = np.argsort(distances)
sort_index
3.确定k值
k = 5
4.距离最近的k个点投票
first_k = [y_train[i] for i in sort_index[:k]]
first_k
from collections import Counter
Counter(first_k)
Counter(first_k).most_common()
Counter(first_k).most_common(1)
predict_y = Counter(first_k).most_common(1)[0][0]
predict_y
得到结果为1,KNN判断新加入的点data_y的标记应该为1,从图中也可以看到,新加入的点更靠近标记为1的点群。
scikit-learn中的KNN算法
from sklearn.neighbors import KNeighborsClassifier
kNN_classifier = KNeighborsClassifier(n_neighbors=5)
kNN_classifier.fit(X_train, y_train)
data_new.reshape(1, -1)
predict_y = kNN_classifier.predict(data_new.reshape(1, -1))
predict_y
与手写KNN得到的结果相同,皆判断为1。