Machine Learning for BeginnersMachine Learning for Beginners1

Machine Learning - K-Nearest Neighbors (KNN)



What is K-Nearest Neighbors (KNN)?

K-Nearest Neighbors (KNN) is one of the simplest yet powerful machine learning algorithms. It's a supervised learning algorithm used for classification and regression problems.

Real-Life Analogy to Understand KNN

Imagine you're new in a city and want to decide which restaurant to try. You ask your 5 nearest neighbors (K=5) for recommendations. If 3 out of 5 suggest an Italian place, you're likely to go there. That's KNN!

Question:

Why do we consider "nearest" neighbors?

Answer:

Because similar data points are likely to belong to the same class. "Nearest" means "most similar" in terms of feature distance (usually Euclidean distance).

How KNN Works (Step-by-Step)

  1. Choose the number of neighbors, K.
  2. Calculate distance between the new data point and all other points.
  3. Select the K closest neighbors.
  4. Majority voting (for classification) or average (for regression).

Example: Predicting Fruit Type Based on Weight and Size

Let’s say you have a dataset of fruits with their weight and size and labels: Apple or Orange.

WeightSizeFruit
1507Apple
1306Apple
1808Orange
1707Orange
1607?

Now we want to classify the new fruit (160g, size 7). We’ll use K=3. The algorithm will find the 3 closest neighbors using distance and count how many are Apples vs Oranges.

Question:

What if there's a tie? (e.g., 2 neighbors are Apple, 2 are Orange with K=4)

Answer:

In case of tie, reduce K or use distance-weighted voting (closer neighbors get more weight).

Python Code: KNN Using Scikit-Learn


from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd

# Step 1: Sample dataset
data = {
    'Weight': [150, 130, 180, 170, 160],
    'Size': [7, 6, 8, 7, 7],
    'Fruit': ['Apple', 'Apple', 'Orange', 'Orange', 'Unknown']
}

df = pd.DataFrame(data)

# Separate known and unknown
known_df = df[df['Fruit'] != 'Unknown']
unknown_df = df[df['Fruit'] == 'Unknown']

# Step 2: Features and target
X = known_df[['Weight', 'Size']]
y = known_df['Fruit']

# Step 3: Train the model
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X, y)

# Step 4: Predict unknown fruit
prediction = knn.predict(unknown_df[['Weight', 'Size']])
print("Predicted Fruit:", prediction[0])

Output:

Predicted Fruit: Orange

Code Explanation:

Question:

Should you always pick K=3?

Answer:

No. You should try different K values and use cross-validation to choose the best one based on accuracy.

Key Points to Remember

Next Steps

In the next topic, we will learn about Decision Trees which are another popular supervised learning technique that builds internal decision rules instead of relying on neighbors.



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M