When can we say that populations are normally distributed?
Continuous Probability Distributions
Probability distributions are of two types, which are continuous probability distributions and discrete probability distributions. A continuous probability distribution contains an infinite number of values. For example, if time is infinite: you could count from 0 to a trillion seconds, billion seconds, so on indefinitely. A discrete probability distribution consists of only a countable set of possible values.
Normal Distribution
Suppose we had to design a bathroom weighing scale, how would we decide what should be the range of the weighing machine? Would we take the highest recorded human weight in history and use that as the upper limit for our weighing scale? This may not be a great idea as the sensitivity of the scale would get reduced if the range is too large. At the same time, if we keep the upper limit too low, it may not be usable for a large percentage of the population!
When can we say that populations are
Introduction:
Several tests of normality exist, using which you can verify whether a particular data follows the normal distribution.
Usually, before conducting a formal test, we prefer to take the help of graphical methods, to see if the data may be assumed to follow the normal distribution, at least approximately. A few such graphical methods are:
- Histogram of the data , superimposed with a normal probability curve,
- Normal probability plot with confidence interval,
- Normal quantile-quantile (QQ) plot.
- Boxplot, etc.
Explanation:
If the graphical display appears to show at least an approximate normal distribution, then a formal test can be used to verify the normality. A few such tests are as follows:
- Pearson’s Chi-squared test for goodness of fit,
- Shapiro-Wilk test,
- Kolmogorov-Smirnov test, etc.
The Pearson’s Chi-squared test is discussed here.
Pearson’s Chi-squared test for goodness of fit:
Suppose the data set can be divided into n categories or classes, with observed frequency in the ith class as Oi and expected frequency in the ith class as Ei (i = 1, 2, …, n). Further, assume that the data is obtained from a simple random sampling, the total sample size is large, each cell count (for each category) is at least 5 and the observations are independent.
Then, the degrees of freedom, df = (number of categories) – (number of parameters in the model) – 1. For n categories in the data set and 2 parameters (mean and variance) of the normal distribution, df = n – 3.
The test statistic for the test is given as, χ2 = Σ [(Oi – Ei)2/ Ei], where the summation is done over all i = 1, 2, …, n.
The observed frequencies will be known from the data set. The expected frequencies for a normal distribution can be obtained by multiplying the total sample size, say, N, by the normal probability for the corresponding class (obtained from a standard normal table or any software such as, EXCEL, MINITAB, etc.).
The corresponding p-value for the test can be used to check whether the data follows normal distribution or not.
Step by step
Solved in 3 steps