Natural Language Processing - Overview - NLP Pipeline Tutorial

– NLP Pipeline is a set of steps followed to build an end to end NLP software.

NLP software consist of the following steps:-

1] Data Acquisition

2] Text Preparation – text cleanup, basic preprocessing, advance preprocessing(like POS,etc)

3] Feature Engineering

4] Modelling – Model building and evaluation

5] Deployment – Deployment, Monitoring, Model Update

1] Data Acquisition

Collect data from database, webscraping, etc

2] Text Preparation

Text Cleanup - removing html tags, emoji, spelling check using textblob library etc

Basic Preprocessing – Tokenization (sentence, word), stop word removal, stemming, removing digits and punctuation, lowercase, language detection.

Advance Preprocessing – Part Of Speech Tagging, Parsing, Co-reference resolution

3] Feature Engineering – converting text to number

Text Vectorization

Bag of word, TFIDF, One hot encoding

Word2Vec

In deep learning, feature engineering will be automatically done, while in machine learning it is to be done manually.

4] Modelling

I] applying model – apply heuristic or ML alg, or DL or cloud API

Which approach to apply, depend on – amount of data, nature of problem

For small data, heuristis approach is fine, for more data we can use ML or DL

Ii] Evaluation- how the model is performing on unseen data

Using intrinsic evaluation- accuracy, confusion matrix, recall, precision

Or extrinsic evaluation – business centric

5] Deployment –

I] deploy – API( micro service ), chatbot,

Microservices are an architectural and organizational approach to software development where software is composed of small independent services that communicate over well-defined APIs.

Ii] monitoring – dashboarding, comparing old data

Iii] update - periodically update on changes

Natural Language Processing - Overview - NLP Pipeline Tutorial

About Fresherbell

Important Links

Social Media