Sampling Distribution

Suppose we have a finite population and we draw all possible simple random samples of size $$n$$ without replacement or with replacement. For each sample we calculate a statistic (sample mean $$\overline X $$ or proportion $$\widehat p$$, etc.). All possible values of the statistic make a probability distribution which is called the sampling distribution. The number of all possible samples is usually very large and obviously the number of statistics (any function of the sample) will be equal to the number of samples if one and only one statistic is calculated from each sample. In fact, in practical situations, the sampling distribution has a very large number of values. The shape of the sampling distribution depends upon the size of the sample, the nature of the population and the statistic which is calculated from all possible simple random samples. Some of the most well-known sampling distributions are:

(1) Binomial distribution

(2) Normal distribution

(3) t-distribution

(4) Chi-square distribution

(5) F-distribution

These distributions are called derived distributions because they are derived from all possible samples.

 

Standard Error

The standard deviation of a statistic is called the standard error of that statistic. If the statistic is $$\overline X $$, the standard deviation of all possible values of $$\overline X $$ is called the standard error of $$\overline X $$, which may be written as S.E.$$\left( {\overline X } \right)$$ or $${\sigma _{\overline X }}$$. Similarly, if the sample statistic is proportion $$\widehat p$$, the standard deviation of all possible values of $$\widehat p$$ is called the standard error of $$\widehat p$$ and is denoted by $${\sigma _{\widehat p}}$$ or S.E. $$\left( {\widehat p} \right)$$.

 

Sampling Distribution of $$\overline X $$

The probability distribution of all possible values of $$\overline X $$ calculated from all possible simple random samples is called the sampling distribution of $$\overline X $$. In brief, we shall call it the distribution of $$\overline X $$. The mean of this distribution is called the expected value of $$\overline X $$ and is written as $$E\left( {\overline X } \right)$$ or $${\mu _{\overline X }}$$. The standard deviation (standard error) of this distribution is denoted by S.E.$$\left( {\overline X } \right)$$ or $${\sigma _{\overline X }}$$ and the variance of $$\overline X $$ is denoted by $$Var\left( {\overline X } \right)$$ or $${\sigma ^2}_{\overline X }$$. The distribution of $$\overline X $$ has some important properties:

  • One important property of the distribution of$$\overline X $$ is that it is a normal distribution when the size of the sample is large. When the sample size $$n$$ is more than$$30$$, we call it a large sample size. The shape of the population distribution does not matter. The population may be normal or non-normal, the distribution of $$\overline X $$ is normal for $$n > 30$$, but this is true when the number of samples is very large. As the distribution of random variable $$\overline X $$ is normal, $$\overline X $$ can be transformed into a standard normal variable $$Z$$ where $$Z = \frac{{\overline X – \mu }}{{\sigma /\sqrt n }}$$. The distribution of $$\overline X $$ has a t-distribution when the population is normal and $$n \leqslant 30$$. Diagram (a) shows the normal distribution and diagram (b) shows the t-distribution.


sampling-a
sampling-b

  • The mean of the distribution of $$\overline X $$ is equal to the mean of the population. Thus $$E\left( {\overline X } \right) = {\mu _{\overline X }} = \mu $$ (population mean). This relation is true for small as well as large sample sizes in sampling without replacement and with replacement.
  • The standard error (standard deviation) of $$\overline X $$ is related to the standard deviation of the population $$\sigma $$ through the relations:
    S.E.$$\left( {\overline X } \right) = {\sigma _{\overline X }} = \frac{\sigma }{{\sqrt n }}$$
    This is true when population is infinite, which means $$N$$ is very large or the sampling is done with replacement from a finite or infinite population.
     S.E.$$\left( {\overline X } \right) = {\sigma _{\overline X }} = \frac{\sigma }{{\sqrt n }}\sqrt {\frac{{N – n}}{{N – 1}}} $$
    This is true when sampling is without replacement from a finite population. The above two equations between $${\sigma _{\overline X }}$$ and $$\sigma $$ are true both for small as well as large sample sizes.