Machine Learning - Overview - Machine Learning Development Life Cycle Tutorial
1] Frame the Problem-
Need to decide the objective of a project, cost estimation, time estimation
Who is the end customer? From where the data will come? Which machine learning model to apply?
2] Gathering the Data-
Web scraping, API, directly using CSV, survey, using data warehouse(ETL) on running database, Spark cluster.
3] Data Preprocessing-
Data sometime may be dirty, noisy, missing values, duplicates, outliers unstructured.
So, we need to preprocess such type of data to make it ready for processing.
4] Exploratory Data Analysis
Studying the relationship between input and output using a visualization graph.
Univariant /bivariant / multivariant analysis
Outlier Detection
Imbalance data handling
5] Feature Engineering and Selection
Feature engineering is the creation of a new column from the existing column.
Ex. Convert the size of all rooms and bathrooms to one feature total house size.
Feature selection is a selection of important required features.
Ex. Suppose there is 100 useless column, then using feature selection select the most important column that is helpful for predicting the target variable.
6] Model Training, Evaluation, and Selection
Based on input and output variables use multiple algorithms, and select the best one with high accuracy, low bias, and low variance.
Evaluate all the models using accuracy score(classification), confusion matrix, r2 score(regression), etc.
Use hyperparameter tuning to improve the performance of the model.
7] Model Deployment
Deploy includes converting the model into an application or website and deploying on servers like AWS, Heroku, etc where users can use it.
We can convert the model to a binary file using Pickle, etc
8] Testing
Test the product using AB Testing, etc.. get feedback from the customer
9] Optimize
We do a series of steps like model backup, data backup, load balancing, retraining of the model etc.