Machine Learning - Machine Learning Development Life Cycle - Machine Learning Pipelines Tutorial
Pipelines are a mechanism that chains multiple steps together so that the output of each step is used as input to the next step.
It means that it performs a sequence of steps in which the output of the first transformer becomes the input for the next transformer.
Pipelines makes it easy to apply the same preprocessing to train and test. Which means if you have not used pipelines on server, then the same preprocessing step you need to repeat manually, which will become hectic.
And for any reason you have change some preprocessing step on development, then you need to again make changes on production. Which will become again more hectic. Hence, Pipeline is very very important.
How to apply pipeline?
Using Column transformer-
In the above example. We have used column index instead of column name because after imputation it will not be dataframe, instead it will be numpy array. And numpy array doesn’t has column name, so whenever we will write column name it is most likely to throw error.
remainder=’passthrough’, it will prevent remaining column to drop, and will pass as it is to the nest transformer
Now add all transformer to pipeline-
Or33:33
Both make_pipeline and pipeline will work same. But pipeline will given more info using pipe.named_steps
# Below Pipeline Is Displayed because of above code
from sklearn import set_config
set_config(display='diagram')
Cross Validation using Pipeline
GridSearch using Pipeline
On Production/ Server Pipeline Code is very simple, but in case of without pipeline is very hectic-