Machine Learning - Machine Learning Development Life Cycle - Mathematical Transformation Tutorial

Mathematical Transformation will turns one function or graph into another related function or graph by applying mathematical formula on column. In machine learning, dataset should be normal distributed or close to normal distributed (mean=median=mode). There are different ways to transform a continuous (numeric) variable so that the resulting variable looks more normally distributed. Some of them are-

Function Transformer

Log Trans
Reciprocal
Power (Sq | Sqrt)
Custom

Power Transformer

Box-Cox
Yeo-Johnson

Quantile Transformer

How to find if data is normal?

Using sns.distplot()
Using pd.skew()
QQ Plot

Advanced Statistical Concepts in Data Science

Function Transformer

Log Transformation

Log transformation is a data transformation method in which it replaces each variable x with a log(x). In other words, the log transformation reduces or removes the skewness of our original data.

Usually used for right skewed data.

Func = np.log1p i.e it will add 1 to value to prevent it making 0 after applying log

While np.log is simple log. It can become zero after applying log on 1

Reciprocal

The reciprocal transformation is defined as the transformation of x to 1/x.

Power (Sq | Sqrt)

Squared is usually used for left skewed data

Custom

Power Transformer

Box-Cox

The exponent here is a variable called lambda that varies over the range of -5 to 5, and in the process of searching, we examine all values of lambda. Finally, we choose the optimal value (resulting in the best approximation to a normal distribution) for your variable.

Yeo-Johnson

This transformation is somewhat of an adjustment to the Box-Cox transformation, by which we can apply it to negative number.

Quantile Transformer

Machine Learning - Machine Learning Development Life Cycle - Mathematical Transformation Tutorial

About Fresherbell

Important Links

Social Media