Mastering k-Nearest Neighbors (k-NN) with a Practical Python Example-拾光赋

Mastering k-Nearest Neighbors (k-NN) with a Practical Python Example

1个月前发布

02915

Introduction
Imagine you’ve just moved to a new city and are looking for a good restaurant. You don’t know much about the area, so you ask three locals for recommendations.
• Two suggest Restaurant A.
• One suggests Restaurant B.
Since the majority vote favors Restaurant A, you decide to eat there.
This simple decision-making process mirrors how the k-Nearest Neighbors (k-NN) algorithm works in machine learning! In this post, we’ll dive deep into k-NN, understand its working mechanism, and implement it in Python with a practical example.

What is k-Nearest Neighbors (k-NN)?
k-NN is a supervised machine learning algorithm used for both classification and regression. It classifies a data point based on the majority vote of its nearest neighbors.

How k-NN Works:

Choose the number of neighbors (k).
Compute the distance between the new data point and all others in the dataset.
Select the k nearest points.
Perform a majority vote to determine the class of the new data point. Consider it as finding similar cases in a dataset and making predictions based on those similarities.

Implementing k-NN in Python

Let’s walk through a step-by-step implementation using a dataset where we predict whether a person will purchase a product based on Age and Estimated Salary.

Step 1: Import Necessary Libraries


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

Enter fullscreen mode Exit fullscreen mode

Step 2: Create a Sample Dataset


data = {
    'Age': [22, 25, 47, 52, 46, 56, 55, 60, 62, 61, 18, 24, 33, 40, 35],
    'EstimatedSalary': [15000, 29000, 43000, 76000, 50000, 83000, 78000, 97000, 104000, 98000, 12000, 27000, 37000, 58000, 41000],
    'Purchased': [0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0]  # 1: Purchased, 0: Not Purchased
}
df = pd.DataFrame(data)
print(df.head())
Step 3: Data Preprocessing
X = df[['Age', 'EstimatedSalary']]
y = df['Purchased']
data = {
    'Age': [22, 25, 47, 52, 46, 56, 55, 60, 62, 61, 18, 24, 33, 40, 35],
    'EstimatedSalary': [15000, 29000, 43000, 76000, 50000, 83000, 78000, 97000, 104000, 98000, 12000, 27000, 37000, 58000, 41000],
    'Purchased': [0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0]  # 1: Purchased, 0: Not Purchased
}
df = pd.DataFrame(data)
print(df.head())
Step 3: Data Preprocessing
X = df[['Age', 'EstimatedSalary']]
y = df['Purchased']
data = {
    'Age': [22, 25, 47, 52, 46, 56, 55, 60, 62, 61, 18, 24, 33, 40, 35],
    'EstimatedSalary': [15000, 29000, 43000, 76000, 50000, 83000, 78000, 97000, 104000, 98000, 12000, 27000, 37000, 58000, 41000],
    'Purchased': [0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0]  # 1: Purchased, 0: Not Purchased
}
df = pd.DataFrame(data)
print(df.head())
Step 3: Data Preprocessing
X = df[['Age', 'EstimatedSalary']]
y = df['Purchased']

Enter fullscreen mode Exit fullscreen mode

# Splitting into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Enter fullscreen mode Exit fullscreen mode

# Feature Scaling (Normalization)


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Step 4: Train the k-NN Model
k = 3
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
Step 5: Make Predictions and Evaluate the Model
y_pred = knn.predict(X_test)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Step 4: Train the k-NN Model
k = 3
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
Step 5: Make Predictions and Evaluate the Model
y_pred = knn.predict(X_test)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Step 4: Train the k-NN Model
k = 3
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
Step 5: Make Predictions and Evaluate the Model
y_pred = knn.predict(X_test)

Enter fullscreen mode Exit fullscreen mode

# Evaluating Performance


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", report)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", report)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", report)

Enter fullscreen mode Exit fullscreen mode

Key Insights
1. Choosing the Right k Value:

– Small k (e.g., 1 or 3) makes the model sensitive to noise.
– Large k (e.g., 10 or 15) smooths out noise but may miss patterns.
Use cross-validation to determine the best k.
2. Importance of Feature Scaling:
k-NN relies on distance calculations, so normalizing the features ensures they contribute equally.
3. Best for Small Datasets:
k-NN is great for datasets with fewer features but computationally expensive for large datasets.

Final Thoughts
k-Nearest Neighbors (k-NN) is a powerful yet simple algorithm that can be applied to various classification problems. While it performs well on smaller datasets, it’s important to consider computational costs when scaling up.
Would you like to explore how k-NN works on image classification or time-series forecasting? Let me know in the comments!

原文链接：Mastering k-Nearest Neighbors (k-NN) with a Practical Python Example

展开阅读全文

© 版权声明

文章版权声明 1、本网站名称：拾光赋
2、本站永久网址：https://www.blogs.ink
3、本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长QQ：805375623进行删除处理。
4、本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
6、本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END

Python（EN）
# python # machinelearning # learning # discuss

喜欢就支持一下吧

The best way out is always through.

一路走到底，你就会发现那个最佳出口

评论抢沙发

请登录后发表评论

暂无评论内容

今日剩余 42.3%

本周剩余 6.0%

本月剩余 34.7%

本年剩余 70.0%

最近评论

每日一言

I am ordinary yet unique.

我很平凡，但我独一无二