# Sampling Errors

Suppose we are interested in the value of a population parameter, the true value of which is $\theta$ but is unknown. The knowledge about $\theta$ can be obtained either from a sample data or from the population data. In both cases, there is a possibility of not reaching the true value of the parameter. The difference between the calculated value (from sample data or from population data) and the true value of the parameter is called error. Thus error is something which cannot be determined accurately if the population is large and the units of the population are to be measured. Suppose we are interested to find the total production of wheat in Pakistan in a certain year. Sufficient funds and time are at our disposal and we want to get the ‘true’ figure about production of wheat. The maximum we can do is that we contact all the farmers and suppose all the farmers give maximum cooperation and supply the information as honestly as possible. But the information supplied by the farmers will have errors in most of the cases. Thus we may not be able to identify the ‘true’ figure. In spite of all efforts, we shall be in darkness. The calculated or the observed figure may be good for all practical purposes but we can never claim that a true value of the parameter has been obtained. If the study of the units is based on ‘counting’ may be we can get the true figure of the population parameter. There are two kinds of errors (i) sampling errors or random errors (ii) non-sampling errors.

Sampling Errors:

These are the errors which occur due to the nature of sampling. The sample selected from the population is one of all possible samples. Any value calculated from the sample is based on the sample data and is called sample statistic. The sample statistic may or may not be close to the population parameter. If the statistic is $\widehat \theta$ and the true value of the population parameter is $\theta$, then the difference $\widehat \theta - \theta$ is called sampling error. It is important to note that a statistic is a random variable and it may take any value. A particular example of sampling error is the difference between the sample mean $\overline X$ and the population mean $\mu$. Thus sampling error is also a random term. The population parameter is usually not known; therefore the sampling error is estimated from the sample data. The sampling error is due to the reason that a certain part of the population goes to the sample. Obviously, a part of the population cannot give the true picture of the properties of the population. But one should not get the impression that a sample always gives the result which is full of errors. We can design a sample and collect the sample data in a manner so that the sampling errors are reduced. The sampling errors can be reduced by the following methods: (1) by increasing the size of the sample (2) by stratification.

Reducing the Sampling Errors:

1. By increasing the size of the sample. The sampling error can be reduced by increasing the sample size. If the sample size n is equal to the population size $N$, then the sampling error is zero.
2. By Stratification. When the population contains homogeneous units, a simple random sample is likely to be representative of the population. But if the population contains dissimilar units, a simple random sample may fail to be representative of all kinds of units, in the population. To improve the result of the sample, the sample design is modified. The population is divided into different groups containing similar units. These groups are called strata. From each group (stratum), a sub-sample is selected in a random manner. Thus all the groups are represented in the sample and sampling error is reduced. It is called stratified-random sampling. The size of the sub-sample from each stratum is frequently in proportion to the size of the stratum. Suppose a population consists of 1000 students out of which 600 are intelligent and 400 are non-intelligent. We are assuming here that we do have this much information about the population. A stratified sample of size $n =$100 is to be selected. The size of the stratum is denoted by ${N_1}$ and ${N_2}$ respectively and the size of the samples from each stratum may be denoted by ${n_1}$ and ${n_2}$. It is written as under:

 Stratum No. Size of stratum Size of sample from each stratum 1 ${N_1} = 600$ ${n_1} = \frac{{n \times {N_1}}}{N} = \frac{{100 \times 600}}{{1000}} = 60$ 2 ${N_2} = 400$ ${n_2} = \frac{{n \times {N_2}}}{N} = \frac{{100 \times 400}}{{1000}} = 40$ ${N_1} + {N_2} = N = 1000$ ${n_1} + {n_2} = n = 100$

The size of the sample from each stratum has been calculated according to the size of the stratum. This is called proportional allocation. In the above sample design, the sampling fraction in the population is $\frac{n}{N} = \frac{{100}}{{1000}} = \frac{1}{{10}}$ and the sampling fraction in both the strata is also $\frac{1}{{10}}$. Thus this design is also called fixed sampling fraction. This modified sample & sign is frequently used in sample surveys. But this design requires some prior information about the units of the population. On the basis of this information, the population is divided into different strata. If the prior information is not available then the stratification is not applicable.