Linear Model

Regression involves the study of equations. First we talk about some simple equations or linear models. The simplest mathematical model or equation is the equation of a straight line.

Example:

Suppose a shopkeeper is selling pencils, and he sells one pencil for 2 cents. The table below gives the number of pencils sold and the sale price of the pencils.

Number of pencils sold
$$0$$
$$1$$
$$2$$
$$3$$
$$4$$
$$5$$
Sale price (cents)
$$0$$
$$2$$
$$4$$
$$6$$
$$8$$
$$10$$

Let us examine the two variables given in the table. For the sake of convenience, we can give some names to the variables given in the table. Let $$X$$ denote the number of pencils sold and $$S$$ ($$S$$ for sale) denote the amount realized by selling $$X$$ pencils. Thus,

$$X$$
$$0$$
$$1$$
$$2$$
$$3$$
$$4$$
$$5$$
$$S$$
$$0$$
$$2$$
$$4$$
$$6$$
$$8$$
$$10$$

The information written above can be presented in some other forms as well. For example, we can write an equation describing the above relation between $$X$$ and $$S$$. It is very simple to write the equation: The algebraic equation connecting $$X$$ and $$S$$ is $$S = 2X$$.

This is called a mathematical equation or mathematical model in which $$S$$ depends upon $$X$$. Here $$X$$ is called the independent variable and $$S$$ is called the dependent variable. So, cent $$4$$ is neither less than $$4$$, nor more than $$4$$.

The above model is called a deterministic mathematical model because we can determine the value of $$S$$ without any error by putting the value of $$X$$ in the equation. The sale amount $$S$$ is said to be a function of $$X$$. This statement in symbolic form is written as: $$S = f\left( X \right)$$.

It is read as “$$S$$ is function of $$X$$”. It means that $$S$$ depends upon $$X$$, and only $$X$$ and no other element. The data in the table can be presented in the form of a graph as shown in the figure below.


linear-model-001

The main features of the graph in the figure are:

  1. The graph lies in the first quadrant because all the values of $$X$$ and $$S$$ are positive.
  2. It is an exact straight line. However, not all graphs are in the form of a straight line; there could also be a curve.
  3. All the points (pairs of $$X$$ and $$S$$) lie on the straight line.
  4. The line passes through the origin.
  5. Take any point $$P$$ on the line and draw a perpendicular line $$PQ$$ which joins $$P$$ with the X-axis. Let us find the ratio $$\frac{{PQ}}{{OQ}}$$. Here $$PQ = 6$$ units and $$OQ = 3$$ units. Thus $$\frac{{PQ}}{{OQ}} = \frac{6}{3} = 2$$ units. This is called the slope of the line and in general it is denoted by “$$b$$”. The slope of the line is the same at all points on the line. The slope “$$b$$” is equal to the change in $$Y$$ for a unit change in $$X$$. The relation $$S = 2X$$ is also called the linear equation between $$X$$ and $$S$$.

Example:

Suppose a carpenter wants to make some wooden toys for small children. He has purchased some wood and some other materials for $$$$20$$. The cost of making each toy is $$$$5$$. The table below gives the information about the number of toys made and cost of the toys.

Number of toys
$$0$$
$$1$$
$$2$$
$$3$$
$$4$$
$$5$$
Cost of toys
$$20$$
$$25$$
$$30$$
$$35$$
$$40$$
$$45$$

Let $$X$$ denote the number of toys and $$Y$$ denote the cost of the toys. What is the algebraic relationship between $$X$$ and $$Y$$? When $$X = 0$$, $$Y = 20$$. This is called a fixed or starting cost and it may be denoted by “$$a$$”. For each additional toy, the cost is $$5$$ dollars. Thus $$Y$$ and $$X$$ are connected through the following equation: $$Y = 20 + 5X$$

This is called the equation of a straight line. It is also a mathematical model of deterministic nature. Let us make a graph of the data in the given table. The figure below is the graph of the data in the table. We also note some important features of the graph.


linear-model-002

  1. The line $$AB$$ does not pass through the origin; it passes through the point $$A$$ on the Y-axis. The distance between $$A$$ and the origin $$0$$ is called the intercept and is usually denoted by “$$a$$”.
  2. Take any point $$P$$ on the line and complete a triangle $$PQA$$ as shown in the figure. Let us find the ratio between the perpendicular $$PQ$$ and the base $$AQ$$ of this triangle. The ratio is, $$\frac{{PQ}}{{AQ}} = \frac{{15}}{3} = 5$$units.

This ratio is denoted by “$$b$$” in the equation of a straight line. Thus the equation of a straight line $$Y = 20 + 5X$$ has the intercept $$a = 20$$ and slope $$b = 5$$. In general, when the values of the intercept and slope are not known, we write the equation of a straight line as $$Y = a + bX$$. It is also called a linear equation between $$X$$ and $$Y$$, and the relationship between $$X$$ and $$Y$$ is called linear. The equation $$Y = a + bX$$ may also be called an exact linear model between $$X$$ and $$Y$$ or simply a linear model between $$X$$ and $$Y$$. The value of $$Y$$ can be determined completely when $$X$$ is given. The relationship $$Y = a + bX$$ is therefore called the deterministic linear model between $$X$$ and $$Y$$. In statistics, when we use the term linear model, we do not mean a mathematical model as described above.