Natural Language Processing - Overview - NLP Pipeline Tutorial
– NLP Pipeline is a set of steps followed to build an end to end NLP software.
NLP software consist of the following steps:-
1] Data Acquisition
2] Text Preparation – text cleanup, basic preprocessing, advance preprocessing(like POS,etc)
3] Feature Engineering
4] Modelling – Model building and evaluation
5] Deployment – Deployment, Monitoring, Model Update
1] Data Acquisition
Collect data from database, webscraping, etc
2] Text Preparation
Text Cleanup - removing html tags, emoji, spelling check using textblob library etc
Basic Preprocessing – Tokenization (sentence, word), stop word removal, stemming, removing digits and punctuation, lowercase, language detection.
Advance Preprocessing – Part Of Speech Tagging, Parsing, Co-reference resolution
3] Feature Engineering – converting text to number
Text Vectorization
Bag of word, TFIDF, One hot encoding
Word2Vec
In deep learning, feature engineering will be automatically done, while in machine learning it is to be done manually.
4] Modelling
I] applying model – apply heuristic or ML alg, or DL or cloud API
Which approach to apply, depend on – amount of data, nature of problem
For small data, heuristis approach is fine, for more data we can use ML or DL
Ii] Evaluation- how the model is performing on unseen data
Using intrinsic evaluation- accuracy, confusion matrix, recall, precision
Or extrinsic evaluation – business centric
5] Deployment –
I] deploy – API( micro service ), chatbot,
Microservices are an architectural and organizational approach to software development where software is composed of small independent services that communicate over well-defined APIs.
Ii] monitoring – dashboarding, comparing old data
Iii] update - periodically update on changes