Smoother for
Scatterplot
In a
scatterplot, a smoother
(or a smooth) is trend line
that shows how the two variables (X and Y) are
related to one another. A smooth is not a statistical test of the
relationship of X and Y, although in most cases it is possible to infer
the practical significance of the relationship from examining the smoothed
data.
Smoothers are lines plotted onto a
scatterplot to show the general trend of how the variables plotted on the
X and Y axes of the graph are related to one another. There are different
kinds of smoothers. Some "regression smoothers" are either lines
(and this is the standard option) or smooth curves like quadratic or
logarithmic functions. Current best practice in data analysis is to use a
more complicated set of modeling methods based on what are sometimes
termed "local regressions" employing only the parts of the data
closest to a specific data point to look at trends so that the relationship may be
shown to lie along a general curve, rather than a straight line or simple
polynomial function that does not fit the data as well.
For the purposes of most analyses
presented here, smoothers are presented using a normal kernel, a bandwidth
multiplier of 1.0, and the same bandwidth for all subgroups. The method of
smoothing is local linear regression. If different
parameters are used, they are noted in the specific Knowledge Item. In
effect, this kind of smoother weights disproportionately heavily the
approximately 2/3 of the points that are nearest to it. Smoothing was done
in the SPSS 10.0 program for most examples presented.
In some cases where noted, a
bandwidth of 3.0, or some other variation is used. The increase in
bandwidth tends to weight the data in a way much more like fitting a
linear smoother through all of the data, rather than a subset near the
point.
In most cases,
the smoother we use with a bandwidth of 1.0 tends to yield a
"slightly jagged, curve" relating two variables. When we
increase the bandwidth to 3.0 (or higher at times), the approximation is
to a line, or "almost a line." The advantage of the 1.0
bandwidth is that it will show a curvilinear relationship between two
variables if the relationship in fact exists. The disadvantage is that
sometimes the departure from linearity is either not statistically or
practically significant.
If more than one subgroup (for example:
males and females) is plotted in a single plot, then identical bandwidth smoothers
are presented for each subgroup unless otherwise noted.
Note that the smoother is not a
statistical test of a significant relationship between the X and Y
variables. In most cases, however, if the data are close to the smoother
along its entire length and the smoother goes up or down in a generally
increasing or decreasing way, there will be a significant linear or
nonlinear correlation between the two variables.
In a few cases, the smoothing
was done in the SYSTAT 9.0 program which employs some alternate smoothing
methods. In those cases, the LOWESS algorithm was used. Those examples are
noted in the Knowledge Base. For all practical purposes, the smooths
produces by either SPSS 10.0 or SYSTAT 9.0 are equivalent, although the
algorithms used to produce the smooths differ slightly from one another.