Machine Learning - Supervised Learning - K Nearest Neighbor (KNN) Tutorial
K nearest neighbor algorithm can be used for both classification & Regression. but mostly it is uses for classification problem. In this new data point is assigned to a neighboring group to which it is most similar.
In K nearest neighbors, K can be an integer greater than 1(mostly it is 5). So, for every new data point, we want to classify to which neighboring group it is closest.
Let us classify an object using the following example. Consider there are three clusters:
- Football
- Basketball
- Tennis ball
Let the new data point to be classified is a black ball. We use KNN to classify it. Assume K = 5 (initially).
Next, we find the K (five) nearest data points, as shown.
Observe that all five selected points do not belong to the same cluster. There are three tennis balls and one each of basketball and football.
When multiple classes are involved, we prefer the majority. Here the majority is with the tennis ball, so the new data point is assigned to this cluster.
Advantages of KNN Algorithm:
- It is simple to implement.
- It is robust to the noisy training data
- It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
- Always needs to determine the value of K which may be complex some time.
- The computation cost is high because of calculating the distance between the data points for all the training samples.