Machine Learning - Machine Learning Development Life Cycle - Data Gathering Tutorial
- Working with CSV file
- Working with JSON/SQL (JavaScript Object Notation/ Structured Query Language)
- Fetch API
- Web scraping
- Working with CSV file
Import pandas
# Opening a local CSV file
df = pd.read_csv(‘file.csv’)
# Opening a CSV file from URL using the request module
# use sep=’\t’ to open files separated by tab i.e .tsv
# use names=[‘col1name’,’col2name’,….] to give column name to dataset
# To convert any unique numeric column to an index column, then write index_col=’emp_id’
# To use the first row as the header, then write header=1
# To filter out specific or particular columns from the whole dataset, then use usecols = [‘col1name’,’col2name’,….]
# use skip row to skip a particular row and nrows to show only n number of row
# By default encoding is utf-8. If the dataset is of another encoding, then use the Encoding parameter i.e encoding=’latin-1’, etc.
# Skip bad lines, ex- some row has 5 columns, and some have 6 columns, it will throw a parser error. So, in this case, use error_bad_lines = False, it will skip bad lines
# dtype parameter, to convert datatype use e.g dtype={‘target’:int}
# Handling Dates – to convert string to date use parse_dates =[‘col_name’]
# Convertors – use to apply transformation or function on a particular column
# na_values parameter – To make a particular value to nan. E.g _ to NaN
# Loading a huge dataset in chunks, use chunk size
- Working with JSON/SQL (JavaScript Object Notation/ Structured Query Language)
- Fetch API
- Web Scraping