# Machine Learning - Introduction

Feature Engineering

Feature Transformation

Missing Value Imputation

Handling Categorical Features

Outlier Detection

Feature Scaling

Feature Contruction

Feature Spitting

Feature Selection

Forward Selection

Backward Selection

Feature Extraction

PCA - Principal Component Analysis

Introduction-

It is used in Unsupervised learning & complex technique

Main aim to reduce curse of dimensionality, to avoid the number of computation on high dimensional data.

Transform higher dimensional data to lower dimensional data while keeping the essence of data, i.e visualization

Benefit of PCA:-

Faster Exceution Of Algoritm

Visualization (e.g reducing 10 D to 2 D)

Geometric intuition

In Dataset,suppose there is (No. Of Rooms, No. Of Grocery Shop, target column- Price), from this only No. of Room is more important than no. of grcocey. Hence, using PCA –feature selection ,we will select No. of rooms column only.

If you don’t have an idea of domain of any project, plot the graph both column, and check variance, select column with higher projection.

Feature selection will not work, when both the column are equally important(e.g No. of rooms & No. of bathroom),same variance. In such case,we need to use feature extraction.

In feature extraction, when both the column are equally important(e.g No. of rooms & No. of bathroom),same variance. Then it will combine both column and convert it into single column i.e total flat size.

Range, IQR, Variance, and Standard Deviation are the methods used to understand the distribution of data.

https://www.analyticsvidhya.com/blog/2021/04/dispersion-of-data-range-iqr-variance-standard-deviation/

Range - The Range is a measure of variability. It is calculated by subtracting lower value from higher value. The wide range indicates high variability, and the small range specifies low variability in the distribution.

Range = Highest_value – Lowest_value

Interquartile Range (IQR)

IQR is a range between the third and first quartile. IQR is preferred over a range, because IQR does not influence by outlier like a range. IQR is used to measure variability by splitting a data set into four equal quartiles.

Formula To Find Outliers

[Q1 – 1.5 * IQR, Q3 + 1.5 * IQR]

If the value does not fall in the above range it considers outliers.

Variance -

The variance is a measure of variability. It is calculated by taking the average of squared deviations from the mean. Variance tells you the degree of spread in your data set. The more spread the data, the larger the variance is in relation to the mean.

Population vs Sample variance

Population variance

When you have collected data from every member of the population

The population variance formula looks like this:

Formula

Explanation

= population variance

= summation from 1 to N

Χ = each value

= population mean

Ν = number of values in the population

Sample variance

When you have collected data from a sample

The sample variance formula looks like this:

= sample variance

= summation from 1 to n-1

Χ = each value

= sample mean

n = number of values in the sample

why do we use n-1 in the sample deviation and variance formula instead of n?

The simple answer: the calculations for both the sample standard deviation and the sample variance both contain a little bias (that’s the statistics way of saying “error”) estimate that consistently underestimates variability. Bessel’s correction (i.e. subtracting 1 from your sample size) corrects this bias. In other words, sample variance would tend to be lower than the real variance of the population. Therefore to get accurate result same as population, we you use n-1 instead of n.

Variance Proportional to Spread

Variance gives added weight to the values that impact outliers (the numbers that are far from the mean and squaring of these numbers can skew the data like 10 square is 100, and 100 square is 10,000 ) to overcome the drawback of variance, standard deviation came into the picture.

Standard deviation

The standard deviation is derived from variance and tells you, on average, how far each value lies from the mean. It’s the square root of variance.

Variance Vs Standard Deviation

Both measures reflect in a distribution, but their units differ:

Standard deviation is expressed in the same units as the original values (e.g., meters).

Variance is expressed in much larger units (e.g., meters squared)

Why MAD (Mean Absolute Deviation) is not used instead of variance?

The mean absolute deviation of a dataset is the average distance between each data point and the mean. It gives us an idea about the variability in a dataset.

Xi = Each value from population

Xbar = The population mean

n = Size Of Population

Here | gives the absolute value that means all negative deviation (distance) made positive.

MAD is not differentiable at 0. It will not work in optimization. While variance is differentiable

Correlation

Correlation means, correlation between two variables which is a normalized version of the covariance. The range of correlation coefficients is always between -1 to 1. The correlation coefficient is also known as Pearson’s correlation coefficient.

Negative means they are inversely proportional to each other with the factor of correlation coefficient value.

Positive means they are directly proportional to each other mean vary in the same direction with the factor of correlation coefficient value.

if the correlation coefficient is 0 then it means there is no linear relationship between variables.

Covariance and Covariance matrix

Covariance is a measure of the relationship between two random variables and to what extent, they change together. It range between -∞ and +∞. The covariance of two variables (x and y) can be represented by cov(x,y)