alternative
  • Home (current)
  • About
  • Tutorial
    Technologies
    C#
    Deep Learning
    Statistics for AIML
    Natural Language Processing
    Machine Learning
    SQL -Structured Query Language
    Python
    Ethical Hacking
    Placement Preparation
    Quantitative Aptitude
    View All Tutorial
  • Quiz
    C#
    SQL -Structured Query Language
    Quantitative Aptitude
    Java
    View All Quiz Course
  • Q & A
    C#
    Quantitative Aptitude
    Java
    View All Q & A course
  • Programs
  • Articles
    Identity And Access Management
    Artificial Intelligence & Machine Learning Project
    How to publish your local website on github pages with a custom domain name?
    How to download and install Xampp on Window Operating System ?
    How To Download And Install MySql Workbench
    How to install Pycharm ?
    How to install Python ?
    How to download and install Visual Studio IDE taking an example of C# (C Sharp)
    View All Post
  • Tools
    Program Compiler
    Sql Compiler
    Replace Multiple Text
    Meta Data From Multiple Url
  • Contact
  • User
    Login
    Register

Machine Learning - Ensemble Learning - Boosting Tutorial

Boosting is an ensemble learning technique that uses a set of Machine Learning algorithms to convert weak learner (model with low accuracy) to strong learners (model with high accuracy)  in order to increase the accuracy of the model. 

In this a model is built from the training data. Then the second model is built which tries to correct the errors present in the first model. This procedure is continued and models are added until either the complete training data set is predicted correctly or the maximum number of models are added.

What-Is-Boosting-Boosting-Machine-Learning-Edureka

Decision Stump - The Decision Stump is used for generating a decision tree with only one single split or a decision tree with maximum depth one.

 

  • AdaBoost – Stagewise Additive method

Adaptive boosting or AdaBoost usually use decision trees for modelling. AdaBoost initially gives the same weight to each dataset. Then, it automatically adjusts the weights of the data points after every decision tree. It gives more weight to incorrectly classified items to correct them for the next round. It repeats the process until the residual error, or the difference between actual and predicted values, falls below an acceptable threshold.

 

This diagram explains Ada-boost. Let’s understand it closely:

Box 1: You can see that we have assigned equal weights to each data point and applied a decision stump to classify them as + (plus) or – (minus). The decision stump (D1) has generated vertical line at left side to classify the data points. We see that, this vertical line has incorrectly predicted three + (plus) as – (minus). In such case, we’ll assign higher weights to these three + (plus) and apply another decision stump.

dd1

Box 2: Here, you can see that the size of three incorrectly predicted + (plus) is bigger as compared to rest of the data points. In this case, the second decision stump (D2) will try to predict them correctly. Now, a vertical line (D2) at right side of this box has classified three mis-classified + (plus) correctly. But again, it has caused mis-classification errors. This time with three -(minus). Again, we will assign higher weight to three – (minus) and apply another decision stump.

dd2

Box 3: Here, three – (minus) are given higher weights. A decision stump (D3) is applied to predict these mis-classified observation correctly. This time a horizontal line is generated to classify + (plus) and – (minus) based on higher weight of mis-classified observation.

dd3

Box 4: Here, we have combined D1, D2 and D3 to form a strong prediction having complex rule as compared to individual weak learner. You can see that this algorithm has classified these observation quite well as compared to any of individual weak learner.

dd4

 

For adaboost example or How to calculate alpha or weight? See video – 99 & 100th-

 

Bagging Vs Boosting

1] Type Of Model used – 

In bagging we used model, which has low bias high variance such as Fully grown decision tree, SVM, KNN

In boosting we used model, which has high bias low variance such as shallow decision tree(e.g decision stump, having depth 1), linear regression, logistic regression.

2] Sequential Vs Parallel

In bagging, model is trained parallel, while in boosting, model is trained sequentially.

3] Weightage of base learner

In bagging, each base model has equal weight, while in boosting, model can have different weight

 

  • Gradient Boosting

Gradient Boosting is similar to AdaBoost. the only difference between AdaBoost and Gradient Boosting is that Gradient Boosting does not give incorrectly classified items more weight. 

Instead, Gradient Boosting minimize the loss function by using stagewise additive modelling technique. It will minimize gross prediction error if combined with the previous set of model. so that the present base model is always more effective than the previous one. 

This method attempts to generate accurate results initially instead of correcting errors throughout the process, like AdaBoost. 

When the target column is continuous we use Gradient Boosting Regressor, whereas when it is classification problem, we use Gradient Boosting Classifier. Only difference between two is loss function.

For this reason, Gradient Boosting can lead to more accurate results. Gradient Boosting can help with both classification and regression-based problems.

Algorithm-

https://miro.medium.com/max/700/1*dIHrPFBT2fmXuTXMb-3_Xw.png

R&D Spend, Administration and  Marketing Spend is input column, profit is output column

 

Step 1] In this find the mean of output/target column(target) and initialize means as f0(x)

Step 2] 

2-1] In this find the residual by subtracting f0(x) from profit and initialize it with ri1

2-2] Train the decision tree with R&D Spend, Administration and Marketing Spend as input column and ri1 as output column.

We have use max depth =1 for the decision tree, since our dataset is small. For depth 1, there will be two terminal node. i.e R11 and R21, where R11 is 1st terminal region of 1st decision tree and R21 is 2nd terminal region of 1st decision tree.

2-3] In this we need to find gamma for R11 and R21, on computing that equation we get

Yi – f0(x) – gamma = 0

Gamma11 = 91 -142.33 = -51.33

Yi – f0(x1) – gamma + Y2 – f0(x1) – gamma = 0

192 – 142 – gamma – 144 – 142 – gamma = 0

Gamma11 = 25.66

It is same ri1, because we use loss function as least square mean, if we use other loss function, then answer will be different. So, it is just coincidence. Both have no relation.

 

2-4]  In this step we are performing stagewise additive 

Suppose m = 4, hereDT4 is output of decision tree 4.

Then f4(x) = f3(x) + DT4

But f3(x) = f2(x) + DT3, f2(x) = f1(x) + DT2, f1(x) = f0(x) + DT1 , all are in recursive form

On combining all , we get f4(x) = f0(x) + DT1 + DT2 + DT3 + DT4

 

Hence, final result will be fm(x) or f(x)

Ada Boost VS Gradient Boost

1] In ada boost we use decision stump(decision tree with max depth 1), and in gradient boost we use decision tree with max leaf node is between 8 to 32

2] In ada boost we give weight to each model, while in gradient boost we use learning rate, which is same for all model.

  • Xtreme Gradient Boosting

Extreme Gradient Boosting (XGBoost) improves gradient boosting for computational speed and scale in several ways. XGBoost uses multiple cores on the CPU so that learning can occur in parallel during training. It is a boosting algorithm that can handle extensive datasets, making it attractive for big data applications. The key features of XGBoost are parallelization, distributed computing, cache optimization, and out-of-core processing.

 

Algorithms based on Bagging and Boosting
4.1 Bagging meta-estimator
4.2 Random Forest

4.6 Light GBM

Machine Learning

Machine Learning

  • Introduction
  • Overview
    • Type Of Machine Learning
    • Batch Vs Online Machine Learning
    • Instance Vs Model Based Learning
    • Challenges in Machine Learning
    • Machine Learning Development Life Cycle
  • Machine Learning Development Life Cycle
    • Framing the Problem
    • Data Gathering
    • Understanding your Data
    • Exploratory Data Analysis (EDA)
    • Feature Engineering
    • Principal Component Analysis
    • Column Transformer
    • Machine Learning Pipelines
    • Mathematical Transformation
    • Binning and Binarization | Discretization | Quantile Binning | KMeans Binning
  • Supervised Learning
    • Overview
    • Linear Regression [Regression]
    • Multiple Linear Regression
    • Polynomial Linear Regression [Regression]
    • Bias Variance Trade Off
    • Regularization
    • LOGISTIC REGRESSION [Regression & Classification]
    • Polynomial Logistic Regression
    • Support Vector Machines / Support Vector Regressor
    • Naïve Bayes Classifier [classification]
    • Decision Tree
    • Entropy
    • Information Gain
    • K Nearest Neighbor (KNN)
    • Neural Network (MultiLayer Perceptron)
  • Ensemble Learning
    • Introduction to Ensemble Learning
    • Basic Ensemble Techniques
    • Advanced Ensemble Techniques
    • Random Forest Classifier
    • Boosting
  • UnSupervised Learning
    • Overview
    • K Mean Clustering

About Fresherbell

Best learning portal that provides you great learning experience of various technologies with modern compilation tools and technique

Important Links

Don't hesitate to give us a call or send us a contact form message

Terms & Conditions
Privacy Policy
Contact Us

Social Media

© Untitled. All rights reserved. Demo Images: Unsplash. Design: HTML5 UP.