Deep Learning - CNN - Convolutional Neural Network - Introduction Tutorial
Convolutional Neural Networks or CNNs, also known as convet, are a special kind of neural network for processing data that has a known grid-like topology like time series data(1D) or images(2D).
Why is CNN important?
Although we can use ANN on image data(mnist data), but the result will not be much satisfactory.
but CNN will always perform better than ANN on the image dataset.
The use of ANN on image data has the following problems-
1] High Computational Cost
Suppose you have a 2D image of 40x40 and to put this image in ANN we will convert it into 1D i.e. 1600x1.
Passing 1D in a fully connected layer of 100 nodes to form ANN. then total weight calculation will be 1600x100 = 160000 in 1st hidden layer only. which will increase the computational cost.
2] Overfitting
Connecting each pixel of the image with each node can capture minute patterns which will result in overfitting of data.
3] Loss Of important info like spatial arrangement of pixels
In 2D Data, it is easy to identify the spatial arrangement of pixels e.g. distance between 2 eyes, and the distance between nose and mouth in the case of a human picture.
However, in 1D data, it is difficult to identify the spatial arrangement of pixels. therefore it results in Loss Of important info like spatial arrangement of pixels
How does CNN work? - CNN Intuition
CNN in the first layer(convolutional layer 1) will try to extract primitive features like edges, and then in the next layer (convolutional layer 2), it will try to extract more complex features and so on.
CNN applications-
1] Image Classifications
Used to Classify image correctly from multiple option-
2] Object Localization
3] Object Detection
4] Face Detection and Recognition
5] Image Segmentation
6] Super Resolution(old image to new image)
7] Black&white image to color image
8] Pose Estimation
CNN Vs Visual Cortex
In 1900 there was an experiment made on cat to detect cell features by putting electrodes in the brain cells of cat.
Conclusion-
There are two types of cells i.e. simple cells and complex cells
simple cell is the orientation cells that detects simple features like edges. But each simple cell can detect only one type of edge that why it is called preferred stimuli.
Once the feature is detected by a simple cell they will pass the information to a complex cell.
complex cells detect complex patterns like in the human face and eyes.
Convolution Operation
CNN is a neural network with a combination of multiple layers like convolution, pooling, and fully connected layers.
The previous convolutional layer is used to find primitive features like edges.
Then the next convolutional layer is used to find complex features like the nose and ears in the case of human faces, etc.
An image is a collection of pixels.
GreyScale – Black and White Image - (1 - channel i.e 0-black and 255-white)(28*28) 28 is no.of pixel(variable)
RGB – Coloured Image - (3 - channel i.e red, green and blue – 28*28*3) 28 is no.of pixel(variable)
Edge Detection-
Image matrix is a dot product with filter/kernel to detect feature map of edge(horizontal or vertical)
if your input is 28*28 and filter is 3*3, then feature map will be (28-3+1)*(28-3+1) = 26*26
if your input is 64*64 and filter is 7*7, then feature map will be (64-7+1)*(64-7+1) = 58*58
Deep Lizard - Convolution Operation
For RGB Image-
Single Filter-
Feature map (resultant) will be single channel
if your input is 28*28*3 and filter is 3*3*3, then feature map will be (28-3+1)*(28-3+1) = 26*26
if your input is 64*64*3 and filter is 7*7*3, then feature map will be (64-7+1)*(64-7+1) = 58*58
(m x m x c) * (n x n x c) → (m - n + 1) * ( m - n + 1) - Single Channel Feature Map
Multiple Filters-
if there are multiple filters suppose 2 then
if your input is 28*28*3 and filter1 is 3*3*3 and filter2 is 3*3*3, then feature map will be (28-3+1)*(28-3+1) = 26*26*2
if your input is 64*64*3 and filter1 is 7*7*3 and filter2 is 7*7*3, then feature map will be (64-7+1)*(64-7+1) = 58*58*2