# Confidence Interval Estimate for Difference Between Means

(1) Large Samples:

The difference between two means is of considerable importance in testing the homogeneity of populations. In this tutorial we are concerned about the confidence interval estimate for the difference between two population means.

With a non-rigorous logic from central limit theorem we can state that “If we have two populations with means ${\mu _1}$ and ${\mu _2}$, variances ${\sigma _1}^2$ and ${\sigma _2}^2$ respectively, then the sampling distribution of the difference of their sample means $\left( {{{\overline X }_1} - {{\overline X }_2}} \right)$ is said to be approximately normal with mean $\left( {{\mu _1} - {\mu _2}} \right)$ and standard deviation $\sqrt {\frac{{\sigma _1^2}}{{{n_1}}} + \frac{{\sigma _2^2}}{{{n_2}}}}$, ${n_1}$ and ${n_2}$ are the two sample sizes both larger than 30 from the two populations.”

This formula of combined standard deviation is obtain from the knowledge of a theorem stated  i.e., the variance of a sum or difference of two independent random variables is the sum of their variances. Thus

$Var\left( {X \pm Y} \right) = Var\left( X \right) + Var\left( Y \right)$

Hence

Therefore, standard deviation of $\left( {{{\overline X }_1} - {{\overline X }_2}} \right)$ which is stated as
${\sigma _{{{\overline X }_1} - {{\overline X }_2}}}\,{\text{would}}\,{\text{be}}\,\, = \sqrt {\frac{{\sigma _1^2}}{{{n_1}}} + \frac{{\sigma _2^2}}{{{n_2}}}}$

We can also standardized $\left( {{{\overline X }_1} - {{\overline X }_2}} \right)$ as follows
$Z = \frac{{\left( {{{\overline X }_1} - {{\overline X }_2}} \right) - \left( {{\mu _1} - {\mu _2}} \right)}}{{\sqrt {\frac{{\sigma _1^2}}{{{n_1}}} + \frac{{\sigma _2^2}}{{{n_2}}}} }}$

Where $Z$ is standard normal variate. From this value of $Z$ we can directly state $\left( {1 - \alpha } \right)$ 100% confidence limits for the difference between two population means as
$\left( {{{\overline X }_1} - {{\overline X }_2}} \right) \pm {Z_{\alpha /2}}\sqrt {\frac{{\sigma _1^2}}{{{n_1}}} + \frac{{\sigma _2^2}}{{{n_2}}}}$

And the confidence interval may be stated as

It must be remembered that the above results hold only for large samples or small samples from normal populations provided population variance are known. If $\sigma _1^2$and $\sigma _2^2$ are not known, for large sample, they can be replaced by $S_1^2$ and $S_2^2$ (the sample variances) which are computed by formula${S^2} = \frac{{\sum {{\left( {{X_i} - \overline X } \right)}^2}}}{{n - 1}}$. The larger of the two sample means should be considered as ${\overline X _1}$.

(2) Small Samples:

When at least of the two sample sizes are small, then “$t$” takes the place of $Z$. Two different kinds of interval estimate are obtained depending on whether the two population are assumed to have the same variances

or unequal variances

.

If the two populations are assumed to have unknown and unequal population variances

then, $\left( {1 - \alpha } \right)$100% confidence limits may be stated as
$\left( {{{\overline X }_1} - {{\overline X }_2}} \right) \mp {t_{\alpha /2}}\sqrt {\frac{{S_1^2}}{{{n_1}}} + \frac{{S_2^2}}{{{n_2}}}}$

Where $S_1^2$ and $S_2^2$ are calculated by using formula ${S^2} = \frac{{\sum {{\left( {{X_i} - \overline X } \right)}^2}}}{{n - 1}}$.

If the two populations are assumed to have equal but unknown population variances

then, $\left( {1 - \alpha } \right)$100% confidence limits may be stated as
$\left( {{{\overline X }_1} - {{\overline X }_2}} \right) \mp {t_{\alpha /2}}{S_c}\sqrt {\frac{1}{{{n_1}}} + \frac{1}{{{n_2}}}}$

Where $S_c^2 = \frac{{\left( {{n_1} - 1} \right)S_1^2 + \left( {{n_2} - 1} \right)S_2^2}}{{{n_1} + {n_2} - 2}}$

It may be noted that the $t -$statistic from the table should be obtained against the degree of freedom ${n_1} + {n_2} - 2$ in both of the above cases.

Example:

A random sample of 100 students from MBA class made an average score of 60 with a standard deviation score of 15 in statistics. A random sample of 64 students from BS class made an average score of 66 with a standard deviation of 16 in the same course. Construct a 95% confidence interval for the difference between the mean score of the two classes.

Solution: since both sample sizes are large we will use the $Z -$ statistic to construct the interval. We have the following information.

Using formula$\left( {{{\overline X }_1} - {{\overline X }_2}} \right) \pm {Z_{\alpha /2}}\sqrt {\frac{{\sigma _1^2}}{{{n_1}}} + \frac{{\sigma _2^2}}{{{n_2}}}}$, the 95% lower confidence limit for the difference between two population means $\left( {{\mu _1} - {\mu _2}} \right)$ would be

Also, the upper limit would be
$\left( {{{\overline X }_1} - {{\overline X }_2}} \right) - 1.96\sqrt {\frac{{\sigma _1^2}}{{{n_1}}} + \frac{{\sigma _2^2}}{{{n_2}}}} = 6 + 4.90 = 10.90$

Hence, the 95% confidence interval for the difference between the two population means is
$1.10 < {\mu _1} - {\mu _2} < 10.90$