DataFrames in Statistics - Maple Help

Online Help

All Products    Maple    MapleSim


Home : Support : Online Help : Statistics and Data Analysis : Statistics Package : Statistics/DataFrames

DataFrames in Statistics

 

Description

Examples

Description

• 

This help page describes how to use Statistics commands on DataFrame objects, and other spreadsheet-type data in matrices, sometimes called Matrix data sets.

• 

Many of the data sets you might encounter are two-dimensional in nature. They will have information about a number of items or events; for each item or event, the same properties are known. Such data sets can easily be represented in a DataFrame by having each row correspond to an item and each column to one property of all these items. This is how you would typically store such data in a spreadsheet. You can also store such data in a Matrix, as long as you keep track of labels for the rows and columns yourself.

• 

Many commands in the Statistics package can be used with this type of data:

– 

The following computational commands can be run on DataFrame objects (or Matrices). They are computed per column and the results are returned in a DataSeries object. The labels for the DataSeries are the column labels of the DataFrame. Alternatively, they are computed per column of a Matrix and the results are returned in a row Vector.

AbsoluteDeviation

CentralMoment

Count

CountMissing

Cumulant

DataSummary

Decile

ExpectedValue

FivePointSummary

GeometricMean

HarmonicMean

HodgesLehmann

InterquartileRange

Kurtosis

Mean

MeanDeviation

Median

MedianDeviation

Mode

Moment

Percentile

QuadraticMean

Quantile

Quartile

Range

RousseeuwCrouxQn

RousseeuwCrouxSn

Scale

Skewness

StandardDeviation

StandardError

StandardizedMoment

TrimmedMean

Variance

Variation

WindsorizedMean

  

 

– 

The following visualization commands, listed on the Statistics Visualization help page, also accept DataFrame objects. Generally, the row and column labels are used to label data points and data sets, respectively, as appropriate.

AgglomeratedPlot

AreaChart

BarChart

Biplot

BoxPlot

BubblePlot

ColumnGraph

CumulativeSumChart

ErrorPlot

FrequencyPlot

GridPlot

LineChart

PointPlot

ScatterPlot

ScreePlot

  

 

– 

Statistics[SplitByColumn] and Statistics[Join] split Matrices into submatrices and join them back together.

– 

DataFrame/Aggregate does similar things for DataFrame objects.

• 

Additional examples are found in the Statistics with DataFrames example worksheet.

Examples

withStatistics:

We construct a DataFrame with housing data. The first column has number of bedrooms, the second has the area in square feet, the third has price.

bedrooms3,4,2,4,3,2,2,3,4,4,2,4,4,3,3

bedrooms 1 .. 15 VectorcolumnData Type: anythingStorage: rectangularOrder: Fortran_order

(1)

area1130,1123,1049,1527,907,580,878,1075,1040,1295,1100,995,908,853,856

area 1 .. 15 VectorcolumnData Type: anythingStorage: rectangularOrder: Fortran_order

(2)

price114700,125200,81600,127400,88500,59500,96500,113300,104400,136600,80100,128000,115700,94700,89400

price 1 .. 15 VectorcolumnData Type: anythingStorage: rectangularOrder: Fortran_order

(3)

HouseSalesDataDataFramebedrooms,area,price,columns=Bedrooms,Area,Price

HouseSalesDataBedroomsAreaPrice13113011470024112312520032104981600441527127400539078850062580595007287896500831075113300............

(4)

We can determine the average number of bedrooms, average area, and average price with just the Mean command.

MeanHouseSalesData

Bedrooms3.13333333333333Area1021.06666666667Price1.03706666666667105

(5)

We can also determine the standard error for this mean.

StandardErrorMean,HouseSalesData

Bedrooms0.215288658199187Area56.0832261373064Price5615.39946140175

(6)

Or the 30th percentile for each column.

PercentileHouseSalesData,30

Bedrooms2.93333333333333Area905.066666666667Price89340.

(7)

The GridPlot command can display scatter plots of pairs of columns.

GridPlotHouseSalesData

Tabulate

(8)

Bedrooms

Area

Price

 

We can use the lower diagonal entries to display the values for the correlation.

GridPlotHouseSalesData,lower=Correlation

Tabulate0

(9)

Bedrooms

0.48859738022358107

Area

0.8357896496949664

0.7043989377949643

Price

 

We can determine the average area and price for subgroups of sales defined by number of bedrooms. (The Aggregate command is part of the DataFrame object, not the Statistics package, so it is not available for Matrices.)

AggregateHouseSalesData,Bedrooms

BedroomsAreaPrice12901.75000000000079425.23964.2000000000001.00120105341148.1.22883333333333105

(10)

To create a box plot of prices for each number of bedrooms requires a little more effort.

splitSplitByColumnHouseSalesData,Bedrooms

splitBedroomsAreaPrice3210498160062580595007287896500112110080100,BedroomsAreaPrice13113011470053907885008310751133001438539470015385689400,BedroomsAreaPrice2411231252004415271274009410401044001041295136600124995128000134908115700

(11)

price_splitmapdf→convertdfPrice,Vector,split

price_split81600595009650080100,114700885001133009470089400,125200127400104400136600128000115700

(12)

BoxPlotprice_split,datasetlabels=2,3,4

Most of the things mentioned above can be done with a Matrix, too. Consider the following examples.

HSD_MatrixconvertHouseSalesData,Matrix

HSD_Matrix 15 x 3 MatrixData Type: anythingStorage: rectangularOrder: Fortran_order

(13)

MeanHSD_Matrix

3.133333333333331021.066666666671.03706666666667105

(14)

PercentileHSD_Matrix,30

2.93333333333333905.06666666666789340.

(15)

Some commands have calling sequences where one of the arguments is compared to the data; this is the case for the second argument of AbsoluteDeviation and for the origin parameter of Moment. In these cases, it typically doesn't make much sense to use the same value for each column, so Maple supports using a list or Vector of values instead. These commands do not yet work directly with DataFrame objects.

AbsoluteDeviationHSD_Matrix,3,1000,100000

0.666666666666667157.46666666666718333.3333333333

(16)

StandardErrorMoment,HSD_Matrix,1,origin=3,1000,100000

0.20798860367640254.18154393982985424.99127836814

(17)

See Also

examples/DataFrame/Statistics

 


Download Help Document

Was this information helpful?



Please add your Comment (Optional)
E-mail Address (Optional)
What is ? This question helps us to combat spam