Deep Learning - ANN - Artificial Neural Network - Perceptron Loss Function Tutorial

Problem With Perceptron Trick-

1] Cannot quantify (we cannot define how good our model is?) -

If the point is misclassified, then the perceptron will make changes in the line(W₁, W₂, b).

If the point is not misclassified, then the perceptron will not make any changes in the line(W₁, W₂, b).

From the below image, both line 1 and line 2 classify the red and green areas. But, we cannot quantify the out of both line 1 and line 2 which one is the best line to classify

2] Might not fully converge

If the model will pick up the point on a random basis. then there might be a case where the model will pick up the same already classified point multiple times.

Due to this, the line will not get moved from the original position, and there is the possibility the line might not fully converge.

Perceptron Loss Function

The perceptron algorithm uses a loss function called the "hinge loss" function. Since the perceptron updates the weights in each iteration as follows: for all instance j, and for each weight i. Hence, weight tends to walk toward the error mitigation direction to reduce the loss to find the best line(weight and bias).

L(W₁, W₂, b) = \( \frac{1}{n}\sum_{i=1}^{n} 2(Y_i,f(X_i)) + \alpha R(W_1W_2)\)

\( \alpha R(W_1W_2)\) - Ignoring Regularization

L(W₁, W₂, b) = \(max(0, -Y_i f(X_i))\)

but \(f(X_i) = W_1X_1 + W_2X_2 + b\)

L = \( \frac{1}{n}\sum_{i=1}^{n} max(0,-Y_if(X_i))\)

n: #rows in data

Loss function depends on three things i.e. L(W₁, W₂, b)

we need to find such values of W₁, W₂, b. Due to this, the value of Loss becomes minimal. Therefore we will find argmin value of the Loss function.

L = \(argmin \frac{1}{n}\sum_{i=1}^{n} max(0,-Y_if(X_i))\)

Explanation Of Loss Function

L = \(\frac{1}{n}\sum_{i=1}^{n} max(0,-Y_if(X_i))\)

where f(X_i) = W₁X_i1 + W₂X_i₂ + b

n row	X₁	X₁	Y
1	X₁₁	X₁₂	Y₁
2	X₂₁	X₂₂	Y2

X_ij -> where i is row and j is column

breaking down the loss function

let -Y_if(X_i) = X

therefore max(0,-Y_if(X_i)) = max(0, X)

if X>0 then the output will be X i.e

if X<0 then the output will be 0

Let's assume we have 2 points instead of n points.

L = \(\frac{1}{2}[max(0,-Y_1f(X_1)) + max(0,-Y_2f(X_2))]\)

for i=2.

Practical Example -

Student Placement Table
CGPA	IQ	Placed - Yi - Actual Data	Y_i^' - According to Model	f(X_i)	max(0,-Y_if(X_i))
3	8	-1	1	+ve	greater than 0
-3	1	-1	-1	-ve	0
5	1	1	1	+ve	0
-2	5	1	-1	-ve	greater than 0

Value of f(X_i) for 2D i.e. f(X₂) is W₁X₁+ W₂X₂ + b. therefore for points (3,8), it is on the positive side of the line. Hence W₁X₁+ W₂X₂ + b or f(X₂) is positive.

Finding the value of W₁, W_2, and b. So that loss function will be the minimum

L = \(argmin \frac{1}{n}\sum_{i=1}^{n} max(0,-Y_if(X_i))\)

where f(X_i) = W₁X_i1 + W₂X_i2+ b

for i in epochs:

W₁ = W₁ + \(\eta \frac{\delta L}{\delta W_1}\)

W₂ = W₂ + \(\eta \frac{\delta L}{\delta W_2}\)

b = b + \(\eta \frac{\delta L}{\delta b}\)

To update the value we need to find 3 partial derivatives using the Gradient Descent

After this, we will differentiate loss function with respect to W₁, W_2, and b

\(\frac{\delta L}{\delta W_1} =\frac{\delta L} {\delta f(Xi)}\frac{\delta f(Xi)}{\delta W_1}\)

\(\frac{\delta L}{\delta W_1} = \begin{cases} 0 & \quad \text{if } Y_if(X_i)>=0\\ -Y_i & \quad \text{if } Y_if(X_i)<0\\ \end{cases} \)

\(\frac{\delta f(Xi)}{\delta W_1} = X_{i1}\)

\(\frac{\delta L}{\delta W_1} = \begin{cases} 0 & \quad \text{if } Y_if(X_i)>=0\\ -Y_iX_{i1} & \quad \text{if } Y_if(X_i)<0\\ \end{cases} \)

\(\frac{\delta L}{\delta W_2} = \begin{cases} 0 & \quad \text{if } Y_if(X_i)>=0\\ -Y_iX_{i2} & \quad \text{if } Y_if(X_i)<0\\ \end{cases} \)

\(\frac{\delta L}{\delta b} = \begin{cases} 0 & \quad \text{if } Y_if(X_i)>=0\\ -Y_i & \quad \text{if } Y_if(X_i)<0\\ \end{cases} \)

Practical Link - Perceptron Loss Function

Practically, Perceptron is equal to Logistic Regression, but technically they are the same if the activation function is sigmoid and the loss function binary cross entropy.

Perceptron OR Single Layer Perceptron – This is the simplest feedforward neural network and does not contain any hidden layer. It can learn only linear functions such as OR, AND

Multilayer Perceptron – A MultiLayer Perceptron has one or more hidden layers. It can learn both linear and non-linear functions such as XOR

Type Of Loss Function-

Loss Function	Activation	Output	Activation Function Formula
Hinge Loss	Step function	Perceptron – Binary Classification (-1 1)	\(\mu = \begin{cases} 0 & \quad \text{if } x<0\\ 1 & \quad \text{if } x \geq 0 \end{cases} \ \)
Log-Loss (binary cross entropy)	Sigmoid	Logistic Regression – Binary Classification (0 1)	\(\sigma(z) = \frac{1}{1+ e^-z}\)
Categorical Cross Entropy	Softmax	Softmax Regression, Probability – Multiclass classification	\(\sigma(z) = \frac{e^{x_i}}{\sum_{j=1}^{K}{e^{x_j}}}\)
Mean Squared Error	Linear	Linear Regression - Number

Problem With Perceptron or Single Layer Perceptron-

Perceptron will work on only linear data(OR, AND), not on non-linear data(XOR)

Practical Link - Problem With Perceptron

Created OR, AND, and XOR Dataframe

For AND data it has given clear-cut boundary

For OR data also it has given clear-cut boundary

But for XOR data it has given no clear-cut boundary

Deep Learning - ANN - Artificial Neural Network - Perceptron Loss Function Tutorial

About Fresherbell

Important Links

Social Media