# Scatter Diagram

Scatter diagram is a graphic picture of the sample data. Suppose a random sample of n pairs of observations has the values$\left( {{X_1},{Y_1}} \right),\left( {{X_2},{Y_2}} \right),\left( {{X_3},{Y_3}} \right), \ldots ,\left( {{X_n},{Y_n}} \right)$. These points are plotted on a rectangular co-ordinate system taking independent variable on $X$-axis and the dependent variable on $Y$-axis. Whatever be the name of the independent variable, it is to be taken on $X$-axis. Suppose the plotted points are as shown in figure (a). Such a diagram is called scatter diagram. In this figure, we see that when $X$ has a small value, $Y$ is also small and when $X$ takes a large value, $Y$ also takes a large value. This is called direct or positive relationship between $X$ and $Y$. The plotted points cluster around a straight line. It appears that if a straight line is drawn passing through the points, the line will be a good approximation for representing the original data. Suppose we draw a line $AB$ to represent the scattered points. The line $AB$ rises from left to the right and has positive slope. This line can be used to establish an approximate relation between the random variable $Y$ and the independent variable $X$. It is nonmathematical method in the sense that different persons may draw different lines. This line is called the regression line obtained by inspection or judgment.

Making a scatter diagram and drawing a line or curve is the primary investigation to assess the type of relationship between the variables. The knowledge gained from the scatter diagram can be used for further analysis of the data. In most of the cases the diagrams are not as simple as in figure (a). There are quite complicated diagrams and it is difficult to choose a proper mathematical model for representing the original data. The scatter diagram gives an indication of the appropriate model which should be used for further analysis with the help of method of least squares. Figure (b) shows that the points in the scatter diagram are falling from the top left corner to the right. This is a relation called inverse or indirect. The points are in the neighborhood of a certain line called the regression line.

As long as the scattered points show closeness to a straight line of some direction, we draw a straight line to represent the sample data. But when the points do not lie around a straight line, we do not draw the regression line. Figure (c) shows that the plotted points have a tendency to fall from left to right in the form of a curve. This is a relation called non-linear or curvilinear. Figure (d) shows the points which apparently do not follow any pattern. If $X$ takes a small value, $Y$ may take a small or large value. There seems to be no sympathy between $X$ and $Y$. Such a diagram suggests that there is no relationship between the two variables.