Machine Learning - Machine Learning Development Life Cycle - Mathematical Transformation Tutorial
Mathematical Transformation will turns one function or graph into another related function or graph by applying mathematical formula on column. In machine learning, dataset should be normal distributed or close to normal distributed (mean=median=mode). There are different ways to transform a continuous (numeric) variable so that the resulting variable looks more normally distributed. Some of them are-
- Function Transformer
- Log Trans
- Reciprocal
- Power (Sq | Sqrt)
- Custom
- Power Transformer
- Box-Cox
- Yeo-Johnson
- Quantile Transformer
How to find if data is normal?
- Using sns.distplot()
- Using pd.skew()
- QQ Plot
- Function Transformer
- Log Transformation
Log transformation is a data transformation method in which it replaces each variable x with a log(x). In other words, the log transformation reduces or removes the skewness of our original data.
Usually used for right skewed data.
Func = np.log1p i.e it will add 1 to value to prevent it making 0 after applying log
While np.log is simple log. It can become zero after applying log on 1
- Reciprocal
The reciprocal transformation is defined as the transformation of x to 1/x.
- Power (Sq | Sqrt)
Squared is usually used for left skewed data
- Custom
- Power Transformer
- Box-Cox
The exponent here is a variable called lambda that varies over the range of -5 to 5, and in the process of searching, we examine all values of lambda. Finally, we choose the optimal value (resulting in the best approximation to a normal distribution) for your variable.
- Yeo-Johnson
This transformation is somewhat of an adjustment to the Box-Cox transformation, by which we can apply it to negative number.
- Quantile Transformer