Smoother for Scatterplot

In a scatterplot, a smoother (or a smooth) is trend line that shows how the two variables (X and Y) are related to one another. A smooth is not a statistical test of the relationship of X and Y, although in most cases it is possible to infer the practical significance of the relationship from examining the smoothed data.


Smoothers are lines plotted onto a scatterplot to show the general trend of how the variables plotted on the X and Y axes of the graph are related to one another. There are different kinds of smoothers. Some "regression smoothers" are either lines (and this is the standard option) or smooth curves like quadratic or logarithmic functions. Current best practice in data analysis is to use a more complicated set of modeling methods based on what are sometimes termed "local regressions" employing only the parts of the data closest to a specific data point to look at trends so that the relationship may be shown to lie along a general curve, rather than a straight line or simple polynomial function that does not fit the data as well.

For the purposes of most analyses presented here, smoothers are presented using a normal kernel, a bandwidth multiplier of 1.0, and the same bandwidth for all subgroups. The method of smoothing is local linear regression. If different parameters are used, they are noted in the specific Knowledge Item. In effect, this kind of smoother weights disproportionately heavily the approximately 2/3 of the points that are nearest to it. Smoothing was done in the SPSS 10.0 program for most examples presented.

In some cases where noted, a bandwidth of 3.0, or some other variation is used. The increase in bandwidth tends to weight the data in a way much more like fitting a linear smoother through all of the data, rather than a subset near the point.

In most cases, the smoother we use with a bandwidth of 1.0 tends to yield a "slightly jagged, curve" relating two variables. When we increase the bandwidth to 3.0 (or higher at times), the approximation is to a line, or "almost a line." The advantage of the 1.0 bandwidth is that it will show a curvilinear relationship between two variables if the relationship in fact exists. The disadvantage is that sometimes the departure from linearity is either not statistically or practically significant.

If more than one subgroup (for example: males and females) is plotted in a single plot, then identical bandwidth smoothers are presented for each subgroup unless otherwise noted. 

Note that the smoother is not a statistical test of a significant relationship between the X and Y variables. In most cases, however, if the data are close to the smoother along its entire length and the smoother goes up or down in a generally increasing or decreasing way, there will be a significant linear or nonlinear correlation between the two variables.

In a few cases, the smoothing was done in the SYSTAT 9.0 program which employs some alternate smoothing methods. In those cases, the LOWESS algorithm was used. Those examples are noted in the Knowledge Base. For all practical purposes, the smooths produces by either SPSS 10.0 or SYSTAT 9.0 are equivalent, although the algorithms used to produce the smooths differ slightly from one another.

 


TheMeasurementGroup.com Glossary Index

 

 


Copyright © 1999-2005 by The Measurement Group LLC. All rights reserved. This may not be current and will not be updated.