Statistics for AIML - Descriptive statistics - Central Limit Theorem Tutorial
The central limit theorem states that if you take sufficiently large samples from a population and the sample size is greater than 30, then the sample means will be normally distributed, even if the population isn't normally distributed.
For example- if you take a sample of m=1000, and each sample size is n=50 from the population(e.g.- runs). Then the sample mean will be normally distributed.
In Layman’s terms – even if the data is not normally distributed, the mean of the distribution is normal distribution provided the sample size is large.
The central limit theorem is important because it is used in hypothesis testing and also to calculate confidence intervals.
Key Takeaways
- The central limit theorem (CLT) states that the distribution of sample means approximates a normal distribution as the sample size gets larger.
- Sample sizes equal to or greater than 30 are considered sufficient for the CLT to hold.
- A key aspect of CLT is that the average of the sample means and standard deviations will equal the population mean and standard deviation.
- A sufficiently large sample size can predict the characteristics of a population accurately.
- Expectation of Sample Mean as a random variable = Population Mean. Symbolically E(X̄) = µ.
- Standard Deviation (X̄) = σ / √n (where σ is the standard deviation and n is the sample size).