alternative
  • Home (current)
  • About
  • Tutorial
    Technologies
    C#
    Deep Learning
    Statistics for AIML
    Natural Language Processing
    Machine Learning
    SQL -Structured Query Language
    Python
    Ethical Hacking
    Placement Preparation
    Quantitative Aptitude
    View All Tutorial
  • Quiz
    C#
    SQL -Structured Query Language
    Quantitative Aptitude
    Java
    View All Quiz Course
  • Q & A
    C#
    Quantitative Aptitude
    Java
    View All Q & A course
  • Programs
  • Articles
    Identity And Access Management
    Artificial Intelligence & Machine Learning Project
    How to publish your local website on github pages with a custom domain name?
    How to download and install Xampp on Window Operating System ?
    How To Download And Install MySql Workbench
    How to install Pycharm ?
    How to install Python ?
    How to download and install Visual Studio IDE taking an example of C# (C Sharp)
    View All Post
  • Tools
    Program Compiler
    Sql Compiler
    Replace Multiple Text
    Meta Data From Multiple Url
  • Contact
  • User
    Login
    Register

Machine Learning - Supervised Learning - Regularization Tutorial

  • Regularization

It is a technique to prevent the model from overfitting by adding extra information to it.

There are mainly two types of regularization techniques, which are given below:

  • Ridge Regression (L2) [Regression]

Hyperparameter – alpha which is basically lambda

Ridge regression is one of the types of linear regression in which a small amount of bias or penalty equivalent to the sum of the squares of the coefficients added to the loss function so that overfitting in model get reduced and we can get better long-term predictions.

Ridge regression is a regularization technique, which is used to reduce the complexity of the model. It is also called as L2 regularization.

In this technique, the cost function is altered by adding the penalty term to it. The amount of bias added to the model is called Ridge Regression penalty. We can calculate it by multiplying with the lambda to the squared weight of each individual feature.

The equation for the cost function in ridge regression will be:

cost function=i=1n(Yi-Yi)2+ λ(m2)

In the above equation, the penalty term m2 regularizes the coefficients of the model, and hence ridge regression reduces the amplitudes of the coefficients by decreasing the complexity of the model.

For example-

For overfitted model, Loss LN( Linear Regression) > Loss LR (Ridge Regression).

Hence, we will select Loss LR model over Loss LN.

In, Linear Regression Line, due to overfitting, Loss is equal to zero. Hence to reduce overfitting, we will add penalty term, which will increase the bias but reduce variance, to adjust bias variance trade off.

 

5 Key Points – Ridge Regression

  • How the coefficient get affected?

Coefficient will shrink toward 0, but not become 0, whenever the lambda increase.

  • Higher value are impacted more on increasing lambda value.

Coefficient with high value is impacted more as compare to coefficient with less value on increasing lambda value.

  • Bias Variance TradeOff

Bias and variance depend on lambda value

Lambda(decrease)  - Bias(decrease) overfit Variance (increase)

Lambda(increase)  - Bias(increase) underfit Variance (decrease)

  • Impact on the Loss function

Increasing lambda function, will shrink loss function to 0.

  • Why called Ridge

Because solution lie on circle perimeter .i.e on ridge, hence it is called ridge.

 

Practical Tips

Apply ridge regression only when there input field is greater than or equal to 2

  • Lasso Regression (L1) [Regression]

Hyperparameter – alpha which is basically lambda

Lasso regression is another regularization technique to reduce the overfitting OR   complexity of the model. It stands for Least Absolute Shrinkage and Selection Operator.

It is similar to the Ridge Regression except that the penalty term contains only the absolute values of coefficients instead of a square of coefficients.

Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge Regression can only shrink it near to 0.

Due to this we can perform feature selection in lasso regression. Hence, lasso is better in term of Ridge.

It is also called as L1 regularization. The equation for the cost function of Lasso regression will be:

cost function=i=1n(Yi-Yi)2+ |m|

Some of the features in this technique are completely neglected for model evaluation.

Hence, the Lasso regression can help us to reduce the overfitting in the model as well as the feature selection.

 

Why Lasso regression create sparsity?

In Ridge Regression, on increasing lambda value Coefficient will shrink toward 0, but not become 0. But in case of Lasso it will shrink to 0.

In ridge, value of lamda is in denominator in the coefficient formula, hence lamda can’t affect numerator to shrink coefficient value to zero.

While in case of lasso, value of lamda is in numerator in the coefficient formula, hence lamda can affect numerator to shrink coefficient value to zero.

This means that variables are removed from the model, hence the sparsity.

 

  • Elastic Net [Regression]

Hyperparameter – alpha which is basically lambda, l1_ratio which will decide weightage of lasso and ridge

 

Elastic net linear regression uses the penalties from both the lasso and ridge techniques to regularize regression models. The technique combines both the lasso and ridge regression methods by learning from their shortcomings to improve the regularization of statistical models

The elastic net method performs variable selection and regularization simultaneously.

The elastic net technique is most appropriate where the dimensional data is greater than the number of samples used.

Groupings and variables selection are the key roles of the elastic net technique. 

L1 Ratio will decide weightage of Ridge and Lasso both. If L1 Ratio =0.9, then 0.9% ridge and 0.1% lasso. Alpha(lamda) = a+b

When to use elastic net?

  • When you are unsure about whether to use lasso or ridge
  • If input column has multicollinearity, then elastic net is perfect.

Elastic Net

 

Machine Learning

Machine Learning

  • Introduction
  • Overview
    • Type Of Machine Learning
    • Batch Vs Online Machine Learning
    • Instance Vs Model Based Learning
    • Challenges in Machine Learning
    • Machine Learning Development Life Cycle
  • Machine Learning Development Life Cycle
    • Framing the Problem
    • Data Gathering
    • Understanding your Data
    • Exploratory Data Analysis (EDA)
    • Feature Engineering
    • Principal Component Analysis
    • Column Transformer
    • Machine Learning Pipelines
    • Mathematical Transformation
    • Binning and Binarization | Discretization | Quantile Binning | KMeans Binning
  • Supervised Learning
    • Overview
    • Linear Regression [Regression]
    • Multiple Linear Regression
    • Polynomial Linear Regression [Regression]
    • Bias Variance Trade Off
    • Regularization
    • LOGISTIC REGRESSION [Regression & Classification]
    • Polynomial Logistic Regression
    • Support Vector Machines / Support Vector Regressor
    • Naïve Bayes Classifier [classification]
    • Decision Tree
    • Entropy
    • Information Gain
    • K Nearest Neighbor (KNN)
    • Neural Network (MultiLayer Perceptron)
  • Ensemble Learning
    • Introduction to Ensemble Learning
    • Basic Ensemble Techniques
    • Advanced Ensemble Techniques
    • Random Forest Classifier
    • Boosting
  • UnSupervised Learning
    • Overview
    • K Mean Clustering

About Fresherbell

Best learning portal that provides you great learning experience of various technologies with modern compilation tools and technique

Important Links

Don't hesitate to give us a call or send us a contact form message

Terms & Conditions
Privacy Policy
Contact Us

Social Media

© Untitled. All rights reserved. Demo Images: Unsplash. Design: HTML5 UP.