Matrix Data Sets in Statistics - Maple Help

Home : Support : Online Help : Statistics : Statistics Package : Statistics/MatrixDataSets

Matrix Data Sets in Statistics

Description

 • This help page describes how to use Statistics commands on spreadsheet-type data in matrices.
 • Many of the data sets you might encounter are two-dimensional in nature. They will have information about a number of items or events; for each item or event, the same properties are known. Such data sets can easily be represented in a Matrix by having each row of the Matrix correspond to an item and each column to one property of all these items. This is how you would typically store such data in a spreadsheet.
 • Many commands in the Statistics package can be used with this type of data:
 – Statistics[SplitByColumn] and Statistics[Join] split Matrices into submatrices and join them back together.
 – The following commands can be run on Matrix data sets; they are computed per column and the results are returned in a row Vector:

Examples

 > $\mathrm{with}\left(\mathrm{Statistics}\right):$

We construct a Matrix with housing data. The first column has number of bedrooms, the second has number of square feet, the third has price.

 > $\mathrm{HouseSalesData}:=\mathrm{Matrix}\left(15,3,\left\{\left(1,1\right)=3,\left(1,2\right)=1130,\left(1,3\right)=114694,\left(2,1\right)=4,\left(2,2\right)=1123,\left(2,3\right)=125236,\left(3,1\right)=2,\left(3,2\right)=1049,\left(3,3\right)=81647,\left(4,1\right)=4,\left(4,2\right)=1527,\left(4,3\right)=127368,\left(5,1\right)=3,\left(5,2\right)=907,\left(5,3\right)=88464,\left(6,1\right)=2,\left(6,2\right)=580,\left(6,3\right)=59481,\left(7,1\right)=2,\left(7,2\right)=878,\left(7,3\right)=96484,\left(8,1\right)=3,\left(8,2\right)=1075,\left(8,3\right)=113341,\left(9,1\right)=4,\left(9,2\right)=1040,\left(9,3\right)=104385,\left(10,1\right)=4,\left(10,2\right)=1295,\left(10,3\right)=136603,\left(11,1\right)=2,\left(11,2\right)=1100,\left(11,3\right)=80134,\left(12,1\right)=4,\left(12,2\right)=995,\left(12,3\right)=128007,\left(13,1\right)=4,\left(13,2\right)=908,\left(13,3\right)=115707,\left(14,1\right)=3,\left(14,2\right)=853,\left(14,3\right)=94666,\left(15,1\right)=3,\left(15,2\right)=856,\left(15,3\right)=89412\right\}\right)$
 ${\mathrm{HouseSalesData}}{:=}\left[\begin{array}{c}{\mathrm{15 x 3}}{\mathrm{Matrix}}\\ {\mathrm{Data Type:}}{\mathrm{anything}}\\ {\mathrm{Storage:}}{\mathrm{rectangular}}\\ {\mathrm{Order:}}{\mathrm{Fortran_order}}\end{array}\right]$ (1)

We can create box plots of the price for subgroups of sales defined by number of bedrooms.

 > $\mathrm{ByRooms}:=\mathrm{SplitByColumn}\left(\mathrm{HouseSalesData},1\right)$
 ${\mathrm{ByRooms}}{:=}\left[\left[\begin{array}{rrr}{2}& {1049}& {81647}\\ {2}& {580}& {59481}\\ {2}& {878}& {96484}\\ {2}& {1100}& {80134}\end{array}\right]{,}\left[\begin{array}{rrr}{3}& {1130}& {114694}\\ {3}& {907}& {88464}\\ {3}& {1075}& {113341}\\ {3}& {853}& {94666}\\ {3}& {856}& {89412}\end{array}\right]{,}\left[\begin{array}{rrr}{4}& {1123}& {125236}\\ {4}& {1527}& {127368}\\ {4}& {1040}& {104385}\\ {4}& {1295}& {136603}\\ {4}& {995}& {128007}\\ {4}& {908}& {115707}\end{array}\right]\right]$ (2)

We can determine the average area and price for the whole data set and for the sets with $2$, $3$, and $4$ bedrooms. For the latter, we use the elementwise version of Mean by appending a tilde to the command.

 > $\mathrm{Mean}\left(\mathrm{HouseSalesData}\right)$
 $\left[\begin{array}{ccc}{3.13333333333333}& {1021.06666666667}& {1.03708600000000}{}{{10}}^{{5}}\end{array}\right]$ (3)
 > ${\mathrm{~}}_{\mathrm{Mean}}\left(\mathrm{ByRooms}\right)$
 $\left[\left[\begin{array}{ccc}{2.}& {901.750000000000}& {79436.5000000000}\end{array}\right]{,}\left[\begin{array}{ccc}{3.}& {964.200000000000}& {1.00115400000000}{}{{10}}^{{5}}\end{array}\right]{,}\left[\begin{array}{ccc}{4.}& {1148.}& {1.22884333333333}{}{{10}}^{{5}}\end{array}\right]\right]$ (4)

We can also determine the standard error for this mean.

 > $\mathrm{StandardError}\left(\mathrm{Mean},\mathrm{HouseSalesData}\right)$
 $\left[\begin{array}{ccc}{0.215288658199187}& {56.0832261373064}& {5615.81179205727}\end{array}\right]$ (5)

Or the 30th percentile for each column.

 > $\mathrm{Percentile}\left(\mathrm{HouseSalesData},30\right)$
 $\left[\begin{array}{ccc}{2.93333333333333}& {905.066666666667}& {89348.8000000000}\end{array}\right]$ (6)

Some commands have calling sequences where one of the arguments is compared to the data; this is the case for the second argument of AbsoluteDeviation and for the origin parameter of Moment. In these cases, it typically doesn't make much sense to use the same value for each column, so Maple supports using a list or Vector of values instead.

 > $\mathrm{AbsoluteDeviation}\left(\mathrm{HouseSalesData},\left[3,1000,100000\right]\right)$
 $\left[\begin{array}{ccc}{0.666666666666667}& {157.466666666667}& {18336.8666666667}\end{array}\right]$ (7)
 > $\mathrm{StandardError}\left(\mathrm{Moment},\mathrm{HouseSalesData},1,\mathrm{origin}=\left[3,1000,100000\right]\right)$
 $\left[\begin{array}{ccc}{0.207988603676402}& {54.1815439398298}& {5425.38962762635}\end{array}\right]$ (8)