alternative
  • Home (current)
  • About
  • Tutorial
    Technologies
    C#
    Deep Learning
    Statistics for AIML
    Natural Language Processing
    Machine Learning
    SQL -Structured Query Language
    Python
    Ethical Hacking
    Placement Preparation
    Quantitative Aptitude
    View All Tutorial
  • Quiz
    C#
    SQL -Structured Query Language
    Quantitative Aptitude
    Java
    View All Quiz Course
  • Q & A
    C#
    Quantitative Aptitude
    Java
    View All Q & A course
  • Programs
  • Articles
    Identity And Access Management
    Artificial Intelligence & Machine Learning Project
    How to publish your local website on github pages with a custom domain name?
    How to download and install Xampp on Window Operating System ?
    How To Download And Install MySql Workbench
    How to install Pycharm ?
    How to install Python ?
    How to download and install Visual Studio IDE taking an example of C# (C Sharp)
    View All Post
  • Tools
    Program Compiler
    Sql Compiler
    Replace Multiple Text
    Meta Data From Multiple Url
  • Contact
  • User
    Login
    Register

Machine Learning - Machine Learning Development Life Cycle - Principal Component Analysis Tutorial

  • PCA  - Principal Component Analysis

Introduction-

It is used in Unsupervised learning & complex technique

Main aim to reduce curse of dimensionality, to avoid the number of computation on high dimensional data.

PCA is a technique to Transform higher dimensional data to lower dimensional data while keeping the essence of data, i.e visualization

It remove noise, and extract crucial information such as features and attributes from large amounts of data

 

Benefit of PCA:-

Faster Exceution Of Algoritm

Visualization (e.g reducing 10 D to 2 D)

 

Geometric intuition

  • In Dataset, suppose there is (No. Of Rooms, No. Of Grocery Shop, target column- Price), from this only No. of Room is more important than no. of grocery.   Hence, using feature selection ,we will select No. of rooms column only.
  • If you don’t have an idea of domain of any project, plot the graph both column, and check variance, select column with higher projection.

  • Feature selection will not work, when both the column are equally important(e.g No. of rooms & No. of bathroom), same variance. In such case, we need to use feature extraction.
  • In feature extraction, when both the column are equally important(e.g No. of rooms & No. of bathroom),same variance. Then it will combine both column and convert it into single column i.e total flat size.

 

But in below case when have a idea about the data, suppose we don’t have an idea about the domain we are are working, then in this case we will use PCA.

 

In PCA, it will rotate the graph , and create new axis PC1, PC2  which is a principal component. After that, we are able to find variance of PC! > PC2. Hence, we will select PC1(i.e room column).

PCA use variance to calculate spread by using projection on unit vector. But variance has problem that it will give same result for (-1,1)(0,0)(1,-1) and (-1.-1)(0,0)(1,1). i.e it doesn’t show direction between this 2 variable.

 

Hence, covariance came into picture which tells us the direction of the linear relationship between two random variables. Covariance matrix tell us both spread (variance) and direction(covariance) in a single matrix.

Learn about covariance & covariance matrix more in statistics


 

  • Linear Transformation – Eigen Vector and Eigen Values

Eigen Vector

Eigenvector of a square matrix is defined as a non-vector in which when a given matrix ‘A’ is multiplied with a non-zero vector ‘v’, then it is equal to a scalar quantity λ (known as eigen value),  and the given vector x (known as eigen vector), such that:

Av = λx

The number or scalar value “λ” is an eigenvalue of A. and x is eigenvector of A.

Eigen Vector are special vector in which on applying linear transformation, the direction will not change, only magnitude change, for example – 

3    1

0    2

 

Example(click)

Eigen Value(λ)

Eigenvalues are the special set of scalars associated with the system of linear equations. It is the value with which magnitude change from original to transformation.it is also known as explained variance.

In the above problem, then eigen value is 2 , as the magnitude change from (-1,1) to (-2,2)

What Exactly PCA will do?

In PCA , eigen decomposition(i.e finding eigen vector and eigen value) will be done of covariance matrix, and whose eigen decomposition is greater will have more spread or variance. We will consider Principal component or feature with greater eigen decomposition.

Step by step solution of PCA-

1] Mean Centering

2] Find Covariance matrix

3] Find Eigen Value and Eigen Vector for covariance matrix

You will get 3 eigen vector and 3 eigen value for 3D covariance matrix.

Whose eigen value is greater will be considered as PC1, next one will be PC2, next will be PC3.

To transform 3D to 2D, you will consider only PC1 and PC2.

 

How to transform points from 3D to 2D ?

Suppose dataset is (1000,3) and we need to convert it into (1000,2)

Our eigen vector is (3,3), but we only consider PC1 and PC2. i.e (3,2)

On dot product (1000,3) and (3,2). We will get our 2D dataset i.e (1000,2)


 

Finding Optimum number of principle components?

Linear Distribution Analysis

TSNE


 

  • Sparsity & Density

When representing data using a matrix, we can quantify the number of empty values it contains. This is referred to as its sparsity. A matrix (or dataset) that mostly contains zeros is called a sparse matrix.

Non-empty value is referred to as density.

Machine Learning

Machine Learning

  • Introduction
  • Overview
    • Type Of Machine Learning
    • Batch Vs Online Machine Learning
    • Instance Vs Model Based Learning
    • Challenges in Machine Learning
    • Machine Learning Development Life Cycle
  • Machine Learning Development Life Cycle
    • Framing the Problem
    • Data Gathering
    • Understanding your Data
    • Exploratory Data Analysis (EDA)
    • Feature Engineering
    • Principal Component Analysis
    • Column Transformer
    • Machine Learning Pipelines
    • Mathematical Transformation
    • Binning and Binarization | Discretization | Quantile Binning | KMeans Binning
  • Supervised Learning
    • Overview
    • Linear Regression [Regression]
    • Multiple Linear Regression
    • Polynomial Linear Regression [Regression]
    • Bias Variance Trade Off
    • Regularization
    • LOGISTIC REGRESSION [Regression & Classification]
    • Polynomial Logistic Regression
    • Support Vector Machines / Support Vector Regressor
    • Naïve Bayes Classifier [classification]
    • Decision Tree
    • Entropy
    • Information Gain
    • K Nearest Neighbor (KNN)
    • Neural Network (MultiLayer Perceptron)
  • Ensemble Learning
    • Introduction to Ensemble Learning
    • Basic Ensemble Techniques
    • Advanced Ensemble Techniques
    • Random Forest Classifier
    • Boosting
  • UnSupervised Learning
    • Overview
    • K Mean Clustering

About Fresherbell

Best learning portal that provides you great learning experience of various technologies with modern compilation tools and technique

Important Links

Don't hesitate to give us a call or send us a contact form message

Terms & Conditions
Privacy Policy
Contact Us

Social Media

© Untitled. All rights reserved. Demo Images: Unsplash. Design: HTML5 UP.