Introduction to Regression and Correlation

The statistical methods discussed so far are used to analyze data involving only one variable. Often an analysis of data concerning two or more variables is needed to look for any statistical relationship or association between them.

A few instances where knowledge about an association or relationship between two variables would be vital to making a decision are:

  • Family income and expenditure on luxury items
  • Sales revenue and expenses incurred on advertising
  • Yield of a crop and quantity of fertilizer applied

The following aspects are considered when examining the statistical relationship between two or more variables:

  • Is there an association between two or more variables? If yes, what is the form and degree of that relationship?
  • Is the relationship strong or significant enough to arrive at a desirable conclusion?
  • Can the relationship be used for predictive purposes, that is, to predict the most likely value of a dependent variable corresponding to the given value of the independent variable or variables?

There are two different techniques which are used for the study of two or more variables: regression and correlation. Both study the behavior of the variables but they differ in their end results.

Regression studies the relationship where dependence is necessarily involved. One variable is dependent on a certain number of variables. Regression can be used for predicting the values of a variable which depends upon other variables. The term regression was introduced by the English biometrician Sir Francis Galton (1822 – 1911).

Correlation attempts to study the strength of the mutual relationship between two variables. In correlation we assume that the variables are random and dependence of any nature is not involved.