alternative
  • Home (current)
  • About
  • Tutorial
    Technologies
    C#
    Deep Learning
    Statistics for AIML
    Natural Language Processing
    Machine Learning
    SQL -Structured Query Language
    Python
    Ethical Hacking
    Placement Preparation
    Quantitative Aptitude
    View All Tutorial
  • Quiz
    C#
    SQL -Structured Query Language
    Quantitative Aptitude
    Java
    View All Quiz Course
  • Q & A
    C#
    Quantitative Aptitude
    Java
    View All Q & A course
  • Programs
  • Articles
    Identity And Access Management
    Artificial Intelligence & Machine Learning Project
    How to publish your local website on github pages with a custom domain name?
    How to download and install Xampp on Window Operating System ?
    How To Download And Install MySql Workbench
    How to install Pycharm ?
    How to install Python ?
    How to download and install Visual Studio IDE taking an example of C# (C Sharp)
    View All Post
  • Tools
    Program Compiler
    Sql Compiler
    Replace Multiple Text
    Meta Data From Multiple Url
  • Contact
  • User
    Login
    Register

Deep Learning - ANN - Artificial Neural Network - Perceptron Loss Function Tutorial

Problem With Perceptron Trick-

1] Cannot quantify (we cannot define how good our model is?) -

If the point is misclassified, then the perceptron will make changes in the line(W1, W2, b).

If the point is not misclassified, then the perceptron will not make any changes in the line(W1, W2, b).

From the below image, both line 1 and line 2 classify the red and green areas. But, we cannot quantify the out of both line 1 and line 2 which one is the best line to classify

2] Might not fully converge

If the model will pick up the point on a random basis. then there might be a case where the model will pick up the same already classified point multiple times.

Due to this, the line will not get moved from the original position, and there is the possibility the line might not fully converge.

 

Perceptron Loss Function

The perceptron algorithm uses a loss function called the "hinge loss" function. Since the perceptron updates the weights in each iteration as follows: for all instance j, and for each weight i. Hence, weight tends to walk toward the error mitigation direction to reduce the loss to find the best line(weight and bias).

L(W1, W2, b) = \( \frac{1}{n}\sum_{i=1}^{n} 2(Y_i,f(X_i)) + \alpha R(W_1W_2)\)

\( \alpha R(W_1W_2)\) - Ignoring Regularization

L(W1, W2, b) = \(max(0, -Y_i f(X_i))\)

but \(f(X_i) = W_1X_1 + W_2X_2 + b\)

L  = \( \frac{1}{n}\sum_{i=1}^{n} max(0,-Y_if(X_i))\)

n: #rows in data

Loss function depends on three things i.e. L(W1, W2, b)

we need to find such values of W1, W2, b. Due to this, the value of Loss becomes minimal. Therefore we will find argmin value of the Loss function.

 L  = \(argmin \frac{1}{n}\sum_{i=1}^{n} max(0,-Y_if(X_i))\)

 

Explanation Of Loss Function

L  =  \(\frac{1}{n}\sum_{i=1}^{n} max(0,-Y_if(X_i))\)

 where f(Xi) = W1Xi1 + W2Xi2 + b

n row

X1

X1

Y

1

X11

X12

Y1

2

X21

X22

Y2

Xij -> where i is row and j is column

breaking down the loss function

let -Yif(Xi) = X

therefore  max(0,-Yif(Xi)) = max(0, X)

if X>0 then the output will be X i.e 

if X<0 then the output will be 0

Let's assume we have 2 points instead of n points.

L = \(\frac{1}{2}[max(0,-Y_1f(X_1)) + max(0,-Y_2f(X_2))]\)

for i=2. 

 

Practical Example -

 

Student Placement Table

CGPA

IQ

Placed - Yi - Actual Data

Yi' - According to Model 

f(Xi)

max(0,-Yif(Xi))

3

8

-1

1

+ve

greater than 0

-3

1

-1

-1

-ve

0

5

1

1

1

+ve

0

-2

5

1

-1

-ve

greater than 0

 

Value of f(Xi) for 2D i.e. f(X2) is W1X1 + W2X2 + b. therefore for points (3,8), it is on the positive side of the line. Hence W1X1 + W2X2 + b or f(X2) is positive.

Finding the value of W1, W2, and b. So that loss function will be the minimum

  L  =  \(argmin \frac{1}{n}\sum_{i=1}^{n} max(0,-Y_if(X_i))\)

where f(Xi) = W1Xi1 + W2Xi2 + b

for i in epochs:

W1 = W1 + \(\eta \frac{\delta L}{\delta W_1}\)

​W2 = W2 + \(\eta \frac{\delta L}{\delta W_2}\)

b    =  b + \(\eta \frac{\delta L}{\delta b}\)

​

To update the value we need to find 3 partial derivatives using the Gradient Descent

​After this, we will differentiate loss function with respect to W1, W2, and b

\(\frac{\delta L}{\delta W_1} =\frac{\delta L} {\delta f(Xi)}\frac{\delta f(Xi)}{\delta W_1}\)

\(\frac{\delta L}{\delta W_1} = \begin{cases} 0 & \quad \text{if } Y_if(X_i)>=0\\ -Y_i & \quad \text{if } Y_if(X_i)<0\\ \end{cases} \)

 \(\frac{\delta f(Xi)}{\delta W_1} = X_{i1}\)

 

 \(\frac{\delta L}{\delta W_1} = \begin{cases} 0 & \quad \text{if } Y_if(X_i)>=0\\ -Y_iX_{i1} & \quad \text{if } Y_if(X_i)<0\\ \end{cases} \)

  \(\frac{\delta L}{\delta W_2} = \begin{cases} 0 & \quad \text{if } Y_if(X_i)>=0\\ -Y_iX_{i2} & \quad \text{if } Y_if(X_i)<0\\ \end{cases} \)

  \(\frac{\delta L}{\delta b} = \begin{cases} 0 & \quad \text{if } Y_if(X_i)>=0\\ -Y_i & \quad \text{if } Y_if(X_i)<0\\ \end{cases} \)

 

Practical Link - Perceptron Loss Function

 

Practically, Perceptron is equal to Logistic Regression, but technically they are the same if the activation function is sigmoid and the loss function binary cross entropy.

Perceptron OR Single Layer Perceptron – This is the simplest feedforward neural network and does not contain any hidden layer. It can learn only linear functions such as OR, AND

Multilayer Perceptron – A MultiLayer Perceptron has one or more hidden layers. It can learn both linear and non-linear functions such as XOR

 

Type Of Loss Function-

Loss Function

Activation

Output

Activation Function Formula

Hinge Loss

Step function

Perceptron – Binary Classification (-1 1)

\(\mu = \begin{cases} 0 & \quad \text{if } x<0\\ 1 & \quad \text{if } x \geq 0 \end{cases} \ \)

Log-Loss (binary cross entropy)

Sigmoid

Logistic Regression – Binary Classification (0 1)

 \(\sigma(z) = \frac{1}{1+ e^-z}\)

Categorical Cross Entropy

Softmax

Softmax Regression, Probability – Multiclass classification

\(\sigma(z) = \frac{e^{x_i}}{\sum_{j=1}^{K}{e^{x_j}}}\)

Mean Squared Error

Linear

Linear Regression - Number

 

 

Problem With Perceptron or Single  Layer Perceptron-

Perceptron will work on only linear data(OR, AND), not on non-linear data(XOR)

Practical Link - Problem With Perceptron

Created OR, AND, and XOR Dataframe

For AND data it has given clear-cut boundary

 

For OR data also it has given clear-cut boundary

 

But for XOR data it has given no clear-cut boundary

Deep Learning

Deep Learning

  • Introduction
  • LSTM - Long Short Term Memory
    • Introduction
  • ANN - Artificial Neural Network
    • Perceptron
    • Multilayer Perceptron (Notation & Memoization)
    • Forward Propagation
    • Backward Propagation
    • Perceptron Loss Function
    • Loss Function
    • Gradient Descent | Batch, Stochastics, Mini Batch
    • Vanishing & Exploding Gradient Problem
    • Early Stopping, Dropout. Weight Decay
    • Data Scaling & Feature Scaling
    • Regularization
    • Activation Function
    • Weight Initialization Techniques
    • Optimizer
    • Keras Tuner | Hyperparameter Tuning
  • CNN - Convolutional Neural Network
    • Introduction
    • Padding & Strides
    • Pooling Layer
    • CNN Architecture
    • Backpropagation in CNN
    • Data Augmentation
    • Pretrained Model & Transfer Learning
    • Keras Functional Model
  • RNN - Recurrent Neural Network
    • RNN Architecture & Forward Propagation
    • Types Of RNN
    • Backpropagation in RNN
    • Problems with RNN

About Fresherbell

Best learning portal that provides you great learning experience of various technologies with modern compilation tools and technique

Important Links

Don't hesitate to give us a call or send us a contact form message

Terms & Conditions
Privacy Policy
Contact Us

Social Media

© Untitled. All rights reserved. Demo Images: Unsplash. Design: HTML5 UP.