stats[statplots, scatterplot]  Scatter Plot

Calling Sequence


stats[statplots, scatterplot(data, $\mathrm{arg}\=\mathrm{value}$, ...)]
statplots[scatterplot(data, $\mathrm{arg}\=\mathrm{value}$, ...)]
scatterplot(data, $\mathrm{arg}\=\mathrm{value}$, ...)


Description


•

The function scatterplot of the subpackage stats[statplots] gives a scatter plot of the data.

•

There are several formats available for scatterplots:

•

In addition to the above, there are some formats that are valid only for onedimensional scatter plots:

•

The default, when no format parameter is specified, is to plot the points and classes in the given statistical list(s). In onedimension, (scatterplot(data1)), the points are plotted at their xvalue, with an assumed yvalue of 1. In two or three dimensions, (scatterplot(data1, data2, data3)), the lists are combined to form points or regions. So data1 becomes the xvalues, data2 becomes the yvalues and so on. The points are paired according to the order of the list, and the weights of each data item. Each statistical list must have the same total weight. Classes are plotted as lines, rectangles, or boxes depending on their pairing.

•

The $\mathrm{format}\={\mathrm{agglomerated}}_{n\,l}$ option groups closely spaced points into boxes, representing clusters of points. When there are n points within a cube with sidelength l, those points will be replaced by a box with sidelength $\frac{1}{2}l$. When there are ${n}^{2}$ points within the cube, a bigger box, with sidelength l will be plotted in place of the points. Class data are replaced by their classmarks.

•

The $\mathrm{format}\={\mathrm{excised}}_{p}$ option deletes the fraction, p, of the least densely packed points from the plot. Conversely, when p is negative, the fraction of the most densely packed points are excised. Class data are replaced by their classmarks.

•

The $\mathrm{format}\=\mathrm{quantile}$ option plots quantile values of the data. In one dimension, the quantile values are plotted as the xcomponent versus the data value as the ycomponent. In more than one dimension the statistical lists are sorted according to their quantile, then paired together and plotted. So the rthquantile of data1 is plotted against the rthquantile of data2. Note that the number of observations in each data list can be different.

•

The $\mathrm{format}\={\mathrm{sunflower}}_{l}$ option replaces points by "sunflowers". Each sunflower has one radial arm for every point of weight one (that is, a point with weight three will cause three radial arms in the sunflower). The plot area is divided up into cubes of length l, and one sunflower is plotted inside each cube representing the number of points within the cube. Class data are replaced by their classmarks. The number of arms in a sunflower corresponds to the total weight of the points within the cube, so fractionally weighted points may cause one arm of the sunflower to be shorter than the rest.

•

The format=projected option for onedimensional plots is the default. The points in data1 are plotted at their xvalue along the line $y\=1$. This gives an idea of the concentration of the data, but does not reveal the presence of repeated data. Classes are plotted as lines. Missing data are ignored.

•

The $\mathrm{format}\=\mathrm{jittered}$ option for onedimensional plots causes the points corresponding to a particular xvalue to be scattered along the vertical line at that xvalue. This gives a visual idea of the density of the points.

•

The format=stacked option for onedimensional plots produces a histogramstyle plot. The points in data1 are plotted at their xvalue, and stacked on top of each other starting along the line $y\=1$. The stack of points is taller in proportion to the weight of the points. Class data appear as lines, but are not stacked with the points.

•

The format=symmetry option for onedimensional plots produces a symmetry plot of the data. In this type of plot, the first half of the sorted data minus the median value is plotted versus the median minus the second half of the sorted data. Therefore, if the data is symmetric (with respect to the median), then the plot will produce points on the straight line y=x. Departure from this line indicates deviation from symmetry.

•

Onedimensional quantile plots are closely related to percentage ogives. See cumulative frequency for more information.

•

Multidimensional quantile plots, or quantilequantile plots are useful in comparing multiple data sets. Consider data sets of the maximum daily temperatures in two cities. A scatter plot of one set against the other facilitates comparison of temperatures at the two cities, at each given day. The quantilequantile plot provides answers to questions like: are the lowest third daily temperatures at this city over a greater span of temperatures than those in the lowest third in the other city.

•

When there are "too many" points in a scatter plot, it is sometimes difficult to see important trends. Agglomerated, excised, and sunflower formats help to group the data so that patterns are more obviously visible.

•

The command with(stats[statplots]) allows the use of the abbreviated form of this command.



Examples


Important: The stats package has been deprecated. Use the superseding package Statistics instead.
>

$\mathrm{with}\left(\mathrm{stats}\right)\:$

>

$\mathrm{data}:=\left[\mathrm{Weight}\left(1\,5\right)\,2\,\mathrm{Weight}\left(3\,7\right)\,\mathrm{Weight}\left(4..5\,3\right)\,\mathrm{missing}\,6\,9\,10\,11\,14\,15\,20\right]\:$

>

${\mathrm{statplots}}_{\mathrm{scatterplot}}\left(\mathrm{data}\right)$

one can contrast the three styles of 1D scatter plots by:
>

$\mathrm{with}\left(\mathrm{stats}\[\mathrm{statplots}\]\right)\:$

>

plots[display](
{scatterplot(data, format=jittered),
yshift(10, scatterplot(data, format=stacked)),
yshift(20, scatterplot(data, format=projected)),
plot(proc() 10 end proc, 0..20),
plot(proc() 20 end proc, 0..20)
}, view = [0..20, 0..30]
);

the other 1D scatterplot is more of a summary:
>

$\mathrm{scatterplot}\left(\mathrm{data}\,\mathrm{format}\=\mathrm{symmetry}\right)$

the following is a 2D scatter plot with 1D summaries along the sides
>

$\mathrm{data1}:=\left[2.93\,2.58\,2.85\,4.26\,2.94\,4.33\,1.71\,4.42\,3.59\,4.35\,2.07\,1.16\,2.36\,1.16\,4.72\right]\:$

>

$\mathrm{data2}:=\left[2.46\,4.34\,0.182\,3.22\,5.37\,10.5\,3.11\,1.99\,0.865\,2.56\,10.6\,10.9\,6.56\,7.22\,4.84\right]\:$

>

$\mathrm{plots}\[\mathrm{display}\]\left(\left\{\mathrm{scatterplot}\left(\mathrm{data1}\,\mathrm{data2}\right)\,\mathrm{yshift}\left(15\,\mathrm{scatterplot}\left(\mathrm{data1}\right)\right)\,\mathrm{xshift}\left(8\,\mathrm{xyexchange}\left(\mathrm{scatterplot}\left(\mathrm{data2}\right)\right)\right)\right\}\,\mathrm{view}\=\left[0..10\,0..20\right]\,\mathrm{axes}\=\mathrm{frame}\right)$

The command to create the plot from the Plotting Guide using the data above is
>

$\mathrm{scatterplot}\left(\mathrm{data}\,\mathrm{format}\=\mathrm{jittered}\right)$



See Also


plots[display], Statistics[ScatterPlot], statplots(deprecated)[agglomerated], statplots(deprecated)[excised], statplots(deprecated)[quantile], statplots(deprecated)[sunflower], statplots(deprecated)[symmetry], statplots(deprecated)[xshift], statplots(deprecated)[xyexchange], stats(deprecated)[data], stats(deprecated)[statplots]

