Deep Learning - CNN - Convolutional Neural Network - Introduction Tutorial

Convolutional Neural Networks or CNNs, also known as convet, are a special kind of neural network for processing data that has a known grid-like topology like time series data(1D) or images(2D).

Why is CNN important?

Although we can use ANN on image data(mnist data), but the result will not be much satisfactory.

but CNN will always perform better than ANN on the image dataset.

The use of ANN on image data has the following problems-

1] High Computational Cost

Suppose you have a 2D image of 40x40 and to put this image in ANN we will convert it into 1D i.e. 1600x1.

Passing 1D in a fully connected layer of 100 nodes to form ANN. then total weight calculation will be 1600x100 = 160000 in 1st hidden layer only. which will increase the computational cost.

2] Overfitting

Connecting each pixel of the image with each node can capture minute patterns which will result in overfitting of data.

3] Loss Of important info like spatial arrangement of pixels

In 2D Data, it is easy to identify the spatial arrangement of pixels e.g. distance between 2 eyes, and the distance between nose and mouth in the case of a human picture.
However, in 1D data, it is difficult to identify the spatial arrangement of pixels. therefore it results in Loss Of important info like spatial arrangement of pixels

How does CNN work? - CNN Intuition

CNN in the first layer(convolutional layer 1) will try to extract primitive features like edges, and then in the next layer (convolutional layer 2), it will try to extract more complex features and so on.

CNN applications-

1] Image Classifications

Used to Classify image correctly from multiple option-

2] Object Localization

3] Object Detection

Object Localization and Detection - Artificial Inteligence

4] Face Detection and Recognition

A Highly Accurate Real-time Face Detection And Face, 57% OFF

5] Image Segmentation

Using convolutional neural networks for image segmentation — a quick intro. | by Subodh Malgonde | Good Audience

6] Super Resolution(old image to new image)

7] Black&white image to color image

8] Pose Estimation

CNN Vs Visual Cortex

In 1900 there was an experiment made on cat to detect cell features by putting electrodes in the brain cells of cat.

Conclusion-

There are two types of cells i.e. simple cells and complex cells

simple cell is the orientation cells that detects simple features like edges. But each simple cell can detect only one type of edge that why it is called preferred stimuli.

Once the feature is detected by a simple cell they will pass the information to a complex cell.

complex cells detect complex patterns like in the human face and eyes.

Convolution Operation

CNN is a neural network with a combination of multiple layers like convolution, pooling, and fully connected layers.

The previous convolutional layer is used to find primitive features like edges.

Then the next convolutional layer is used to find complex features like the nose and ears in the case of human faces, etc.

An image is a collection of pixels.

GreyScale – Black and White Image - (1 - channel i.e 0-black and 255-white)(28*28) 28 is no.of pixel(variable)

RGB – Coloured Image - (3 - channel i.e red, green and blue – 28*28*3) 28 is no.of pixel(variable)

Edge Detection-

Image matrix is a dot product with filter/kernel to detect feature map of edge(horizontal or vertical)

if your input is 28*28 and filter is 3*3, then feature map will be (28-3+1)*(28-3+1) = 26*26

if your input is 64*64 and filter is 7*7, then feature map will be (64-7+1)*(64-7+1) = 58*58

Deep Lizard - Convolution Operation

For RGB Image-

Single Filter-

Feature map (resultant) will be single channel

if your input is 28*28*3 and filter is 3*3*3, then feature map will be (28-3+1)*(28-3+1) = 26*26

if your input is 64*64*3 and filter is 7*7*3, then feature map will be (64-7+1)*(64-7+1) = 58*58

(m x m x c) * (n x n x c) → (m - n + 1) * ( m - n + 1) - Single Channel Feature Map

Multiple Filters-

if there are multiple filters suppose 2 then

if your input is 28*28*3 and filter1 is 3*3*3 and filter2 is 3*3*3, then feature map will be (28-3+1)*(28-3+1) = 26*26*2

if your input is 64*64*3 and filter1 is 7*7*3 and filter2 is 7*7*3, then feature map will be (64-7+1)*(64-7+1) = 58*58*2

Deep Learning - CNN - Convolutional Neural Network - Introduction Tutorial

About Fresherbell

Important Links

Social Media