Machine Learning - K-Nearest Neighbors (KNN)

What is K-Nearest Neighbors (KNN)?

K-Nearest Neighbors (KNN) is one of the simplest yet powerful machine learning algorithms. It's a supervised learning algorithm used for classification and regression problems.

Real-Life Analogy to Understand KNN

Imagine you're new in a city and want to decide which restaurant to try. You ask your 5 nearest neighbors (K=5) for recommendations. If 3 out of 5 suggest an Italian place, you're likely to go there. That's KNN!

Question:

Why do we consider "nearest" neighbors?

Answer:

Because similar data points are likely to belong to the same class. "Nearest" means "most similar" in terms of feature distance (usually Euclidean distance).

How KNN Works (Step-by-Step)

Choose the number of neighbors, K.
Calculate distance between the new data point and all other points.
Select the K closest neighbors.
Majority voting (for classification) or average (for regression).

Example: Predicting Fruit Type Based on Weight and Size

Let’s say you have a dataset of fruits with their weight and size and labels: Apple or Orange.

Weight	Size	Fruit
150	7	Apple
130	6	Apple
180	8	Orange
170	7	Orange
160	7	?

Now we want to classify the new fruit (160g, size 7). We’ll use K=3. The algorithm will find the 3 closest neighbors using distance and count how many are Apples vs Oranges.

Question:

What if there's a tie? (e.g., 2 neighbors are Apple, 2 are Orange with K=4)

Answer:

In case of tie, reduce K or use distance-weighted voting (closer neighbors get more weight).

Python Code: KNN Using Scikit-Learn

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd

# Step 1: Sample dataset
data = {
    'Weight': [150, 130, 180, 170, 160],
    'Size': [7, 6, 8, 7, 7],
    'Fruit': ['Apple', 'Apple', 'Orange', 'Orange', 'Unknown']
}

df = pd.DataFrame(data)

# Separate known and unknown
known_df = df[df['Fruit'] != 'Unknown']
unknown_df = df[df['Fruit'] == 'Unknown']

# Step 2: Features and target
X = known_df[['Weight', 'Size']]
y = known_df['Fruit']

# Step 3: Train the model
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X, y)

# Step 4: Predict unknown fruit
prediction = knn.predict(unknown_df[['Weight', 'Size']])
print("Predicted Fruit:", prediction[0])

Predicted Fruit: Orange

Code Explanation:

Step 1: We create a simple DataFrame of known fruits and one unknown.
Step 2: We separate the features (Weight, Size) and labels.
Step 3: We train the KNN model with n_neighbors=3.
Step 4: We predict the label for the unknown fruit using the trained model.

Question:

Should you always pick K=3?

Answer:

No. You should try different K values and use cross-validation to choose the best one based on accuracy.

Key Points to Remember

KNN is simple and works well with small datasets.
It doesn’t learn an internal model — it memorizes the dataset.
Slow for large datasets (lazy learner).
Feature scaling (like normalization) is important in KNN!

Next Steps

In the next topic, we will learn about Decision Trees which are another popular supervised learning technique that builds internal decision rules instead of relying on neighbors.

Machine Learning - K-Nearest Neighbors (KNN)

What is K-Nearest Neighbors (KNN)?

Real-Life Analogy to Understand KNN

Question:

Answer:

How KNN Works (Step-by-Step)

Example: Predicting Fruit Type Based on Weight and Size

Question:

Answer:

Python Code: KNN Using Scikit-Learn

Code Explanation:

Question:

Answer:

Key Points to Remember

Next Steps

Comments

Module 4: Supervised Learning❯

Machine Learning - K-Nearest Neighbors (KNN)

What is K-Nearest Neighbors (KNN)?

Real-Life Analogy to Understand KNN

Question:

Answer:

How KNN Works (Step-by-Step)

Example: Predicting Fruit Type Based on Weight and Size

Question:

Answer:

Python Code: KNN Using Scikit-Learn

Code Explanation:

Question:

Answer:

Key Points to Remember

Next Steps

Comments

Module 4: Supervised Learning❯

Player Settings