Introduction to Regression and Correlation

Statistical methods discussed so far are used to analyze the data involving only one variable. Often an analysis of data concerning two or more variables is needed to look for any statistical relationship or association between them. Few instances where the knowledge of an association or relationship between two variables would vital to make decision are:

  • Family income and expenditure on luxury items.
  • Sales revenue and expenses incurred on advertising.
  • Yield of a crop and quantity of fertilizer applied.

Following aspects are considered in examining the statistical relationship between two or more variables.

  • Is there an association between two or more variables? If yes, what is the form and degree of that relationship?
  • Is the relationship strong or significant enough to arrive at a desirable conclusion?
  • Can the relationship be used for predictive purpose, that is, to predict the most likely value of a dependent variable corresponding to the given value of independent variable or variables?

There are two different techniques which are used for the study of two or more than two variables. These are regression and correlation. Both study the behavior of the variables but they differ in their end results. Regression studies the relationship where dependence is necessarily involved. One variable has the dependence on a certain number of variables. Regression can be used for predicting the values of the variable which depends upon other variables. The term regression was introduced by the English biometrician, Sir Francis Galton (1822 - 1911). Correlation attempts to study the strength of the mutual relationship between two variables. In correlation we assume that the variables are random and dependence of any nature is not involved.