Machine Learning - Machine Learning Development Life Cycle - Understanding your Data Tutorial
- How big is the data?
# size of row and column
df.shape()
- What does the data look like?
# first five-row
df.head()
# to get random row sample
df.sample(5)
- What is the data type of cols?
# to get datatype, non-null value
df.info()
- Are there any missing values?
# to check null value
df.isnull().sum()
- How does the data look mathematically?
# to check mean, min, max, std deviation
df.describe()
- Are there duplicated values?
df.duplicated.sum()
- How is the correlation between cols?
# To check the correlation between two variables – pearson corr
df.corr()
- How to count each category?
df[‘Survived’].value_counts()
- How to check mean, min, max, skewness
df[‘age’].mean()
df[‘age’].min()
df[‘age’].max()
df[‘age’].skew()