It appears in a lot of Machine Learning interviews that, What is the KNN algorithm in Machine Learning? Today we are going to discuss it in depth. Read the full post to get a basic understanding of the KNN algorithm.
Suppose we have a test Input X and we have to classify this X into a label based on our training dataset. How can we do this? Through the KNN algorithm we have to classify X based on its K neighbours. Let’s say we are classifying X based on K=3, so in order to label the X, we will look into the 3 nearest neighbours of X, so if among the 3 neighbours if 2 labels are Yes then, we will classify X as Yes. See the image below.
Formal Definition of KNN algorithm in Machine Learning
Assuming x to be our test point, let’s denote the set of the k nearest neighbours of x as S. Formally, S is defined as
- S subset of D(all data points)
- |S| = K
- Every Point that is in D but not in S is at least as far away from x as the furthest point in S
- Where h() is classifier which returns the most common label available in S
The Wikipedia Definition of the KNN algorithm is HERE.
What if Set S contain equal numbers of labels. For example, it contains two Yes and two No?
We usually take K as an odd number so that there will be more Yes labels or more No labels.
Data Science related posts visit HERE
Algorithms related posts visit HERE
Data Structures related posts visit HERE
Databases related posts Visit HERE
Python-related posts Visit HERE
C++ related posts Visit HERE