Machine Learning - Machine Learning Development Life Cycle - Binning and Binarization | Discretization | Quantile Binning | KMeans Binning Tutorial
1] Encoding Numerical Features
- Discretization (Binning)
Discretization is the process of transforming contiguous variable into discrete variables by creating a set of contiguous intervals that span the range of the variable’s values. Discretization is called binning, where bin is an alternative name for interval.
E.g suppose age is 12,13,13,14,24,28,29,30,30,32,32,26,45.
Then discretization is a process of creating bin i.e 10-20,20-30,30-40,40-50, etc
Why use Discretization:
1] To Handle Outlier
2] To improve the value spread
Class- Sklearn - KbinsDiscretizer
Types Of Discretization-
1] Unsupervised
- Equal Width (uniform)
E.g suppose age is 12,13,13,14,24,28,29,30,30,32,32,26,45.
Suppose you need to create 5 bins, max value is 50, min value is 10
Then (max-min)/bins = 50-10/5 = 8
Then your each interval will be of uniform length 8
i.e 10-18, 18-26, 26-34, 34-42, 42-50 ----Total 5 equal width bin
therefore the age will be-
10-18 – 4
18-26 – 1
26-34 – 7
34-42 – 0
42-50 – 1
- Equal Frequency (quantile)
- K means
This will create clusters
2] Supervised
- Decision Tree
3] Custom Binning
It is binning using custom range
[0 – 18] – Kids
[18 – 60] – Adult
[60 - 80 ] – Senior
- Binarization
Converting continuous value to binary
Class – Scikit Learn - Binarizer
Eg salary<6L – 1
Salary>6L – 0