Linear Model
Regression involves the study of equations. First we talk about some simple equations or linear models. The simplest mathematical model or equation is the equation of a straight line.
Example:
Suppose a shopkeeper is selling pencils, and he sells one pencil for 2 cents. The table below gives the number of pencils sold and the sale price of the pencils.
Number of pencils sold
|
$$0$$
|
$$1$$
|
$$2$$
|
$$3$$
|
$$4$$
|
$$5$$
|
Sale price (cents)
|
$$0$$
|
$$2$$
|
$$4$$
|
$$6$$
|
$$8$$
|
$$10$$
|
Let us examine the two variables given in the table. For the sake of convenience, we can give some names to the variables given in the table. Let $$X$$ denote the number of pencils sold and $$S$$ ($$S$$ for sale) denote the amount realized by selling $$X$$ pencils. Thus,
$$X$$
|
$$0$$
|
$$1$$
|
$$2$$
|
$$3$$
|
$$4$$
|
$$5$$
|
$$S$$
|
$$0$$
|
$$2$$
|
$$4$$
|
$$6$$
|
$$8$$
|
$$10$$
|
The information written above can be presented in some other forms as well. For example, we can write an equation describing the above relation between $$X$$ and $$S$$. It is very simple to write the equation: The algebraic equation connecting $$X$$ and $$S$$ is $$S = 2X$$.
This is called a mathematical equation or mathematical model in which $$S$$ depends upon $$X$$. Here $$X$$ is called the independent variable and $$S$$ is called the dependent variable. So, cent $$4$$ is neither less than $$4$$, nor more than $$4$$.
The above model is called a deterministic mathematical model because we can determine the value of $$S$$ without any error by putting the value of $$X$$ in the equation. The sale amount $$S$$ is said to be a function of $$X$$. This statement in symbolic form is written as: $$S = f\left( X \right)$$.
It is read as “$$S$$ is function of $$X$$”. It means that $$S$$ depends upon $$X$$, and only $$X$$ and no other element. The data in the table can be presented in the form of a graph as shown in the figure below.

The main features of the graph in the figure are:
- The graph lies in the first quadrant because all the values of $$X$$ and $$S$$ are positive.
- It is an exact straight line. However, not all graphs are in the form of a straight line; there could also be a curve.
- All the points (pairs of $$X$$ and $$S$$) lie on the straight line.
- The line passes through the origin.
- Take any point $$P$$ on the line and draw a perpendicular line $$PQ$$ which joins $$P$$ with the X-axis. Let us find the ratio $$\frac{{PQ}}{{OQ}}$$. Here $$PQ = 6$$ units and $$OQ = 3$$ units. Thus $$\frac{{PQ}}{{OQ}} = \frac{6}{3} = 2$$ units. This is called the slope of the line and in general it is denoted by “$$b$$”. The slope of the line is the same at all points on the line. The slope “$$b$$” is equal to the change in $$Y$$ for a unit change in $$X$$. The relation $$S = 2X$$ is also called the linear equation between $$X$$ and $$S$$.
Example:
Suppose a carpenter wants to make some wooden toys for small children. He has purchased some wood and some other materials for $$$$20$$. The cost of making each toy is $$$$5$$. The table below gives the information about the number of toys made and cost of the toys.
Number of toys |
$$0$$
|
$$1$$
|
$$2$$
|
$$3$$
|
$$4$$
|
$$5$$
|
Cost of toys
|
$$20$$
|
$$25$$
|
$$30$$
|
$$35$$
|
$$40$$
|
$$45$$
|
Let $$X$$ denote the number of toys and $$Y$$ denote the cost of the toys. What is the algebraic relationship between $$X$$ and $$Y$$? When $$X = 0$$, $$Y = 20$$. This is called a fixed or starting cost and it may be denoted by “$$a$$”. For each additional toy, the cost is $$5$$ dollars. Thus $$Y$$ and $$X$$ are connected through the following equation: $$Y = 20 + 5X$$
This is called the equation of a straight line. It is also a mathematical model of deterministic nature. Let us make a graph of the data in the given table. The figure below is the graph of the data in the table. We also note some important features of the graph.

- The line $$AB$$ does not pass through the origin; it passes through the point $$A$$ on the Y-axis. The distance between $$A$$ and the origin $$0$$ is called the intercept and is usually denoted by “$$a$$”.
- Take any point $$P$$ on the line and complete a triangle $$PQA$$ as shown in the figure. Let us find the ratio between the perpendicular $$PQ$$ and the base $$AQ$$ of this triangle. The ratio is, $$\frac{{PQ}}{{AQ}} = \frac{{15}}{3} = 5$$units.
This ratio is denoted by “$$b$$” in the equation of a straight line. Thus the equation of a straight line $$Y = 20 + 5X$$ has the intercept $$a = 20$$ and slope $$b = 5$$. In general, when the values of the intercept and slope are not known, we write the equation of a straight line as $$Y = a + bX$$. It is also called a linear equation between $$X$$ and $$Y$$, and the relationship between $$X$$ and $$Y$$ is called linear. The equation $$Y = a + bX$$ may also be called an exact linear model between $$X$$ and $$Y$$ or simply a linear model between $$X$$ and $$Y$$. The value of $$Y$$ can be determined completely when $$X$$ is given. The relationship $$Y = a + bX$$ is therefore called the deterministic linear model between $$X$$ and $$Y$$. In statistics, when we use the term linear model, we do not mean a mathematical model as described above.