Machine Learning - Machine Learning Development Life Cycle - Data Gathering Tutorial

Working with CSV file
Working with JSON/SQL (JavaScript Object Notation/ Structured Query Language)
Fetch API
Web scraping

Working with CSV file

Import pandas

# Opening a local CSV file

df = pd.read_csv(‘file.csv’)

# Opening a CSV file from URL using the request module

# use sep=’\t’ to open files separated by tab i.e .tsv

# use names=[‘col1name’,’col2name’,….] to give column name to dataset

# To convert any unique numeric column to an index column, then write index_col=’emp_id’

# To use the first row as the header, then write header=1

# To filter out specific or particular columns from the whole dataset, then use usecols = [‘col1name’,’col2name’,….]

# use skip row to skip a particular row and nrows to show only n number of row

# By default encoding is utf-8. If the dataset is of another encoding, then use the Encoding parameter i.e encoding=’latin-1’, etc.

# Skip bad lines, ex- some row has 5 columns, and some have 6 columns, it will throw a parser error. So, in this case, use error_bad_lines = False, it will skip bad lines

# dtype parameter, to convert datatype use e.g dtype={‘target’:int}

# Handling Dates – to convert string to date use parse_dates =[‘col_name’]

# Convertors – use to apply transformation or function on a particular column

# na_values parameter – To make a particular value to nan. E.g _ to NaN

# Loading a huge dataset in chunks, use chunk size

Working with JSON/SQL (JavaScript Object Notation/ Structured Query Language)

Fetch API
Web Scraping

Machine Learning - Machine Learning Development Life Cycle - Data Gathering Tutorial

About Fresherbell

Important Links

Social Media