 DataFrame/Aggregate - Maple Programming Help

Home : Support : Online Help : Statistics and Data Analysis : DataFrames and DataSeries : DataFrame/Aggregate

DataFrame/Aggregate

compute aggregate statistics on a DataFrame

 Calling Sequence Aggregate( data, bycolumn, options )

Parameters

 data - DataFrame bycolumn - name, string, or integer; gives the column of the DataFrame used to split the data set into subsets options - (optional) equation(s) of the form option=value where option can be any of bounds, columns, function or tally.

Options

 • bounds : By default, Aggregate uses the SplitByColumn command to creates a separate sub-DataFrame for every discrete level in the column given by bycolumn. If the data in the bycolumn column is more or less continuous, the bounds option can be used to specify boundaries of bins into which values are collected. There are several different ways to use this option:
 – bounds = none is the default and leads to a separate bin for every value in the key column.
 – bounds = [b0, b1, ..., bn] means values from the key column will be divided into n bins: from $\mathrm{b0}$ to $\mathrm{b1}$, from $\mathrm{b1}$ to $\mathrm{b2}$, etc. Values less than $\mathrm{b0}$ or greater than $\mathrm{bn}$, and the rows containing them, are discarded.
 – bounds = quantiles(q0, q1, ..., qn) or bounds = deciles(d0, d1, ..., dn) or bounds = percentiles(p0, p1, ..., pn) or bounds = quartiles(q0, q1, ..., qn) are equivalent ways of specifying bins similar to specifying a list, but instead of specifying absolute numbers for the bounds, you specify which fraction of the data should fall in each of the bins. For example, specifying bounds = percentiles(0, 25, 60, 90) specifies three bins, the first of which contains the lowest 25% of data, then then next 60% - 25% = 35% of data, and finally the following 90% - 60% = 30% of data. That is, the bounds are at the 0th, 25th, 60th, and 90th percentile of the data in the key column. This could also be specified as bounds = quantiles(0, 0.25, 0.6, 0.9) using the notion of Quantile instead of Percentile, or bounds = deciles(0, 2.5, 6, 9) using Deciles, or bounds = quartile(0, 1, 2.4, 3.6) using Quartiles.
 • columns : list; list of numeric values corresponding to the columns or column names. The default is to show all columns.
 • function : procedure or list(procedure,options); controls the summary statistic or procedure that is applied to the subsets of data. The default is Statistics:-Mean. If function is given as a list, the first element corresponds to the command used to compute the aggregate statistic, and any additional arguments are passed to the given command.
 • tally : truefalse; controls the display of an additional column in the aggregate DataFrame containing tallies for each level for the given option ByColumn. The default is false.

Description

 • The Aggregate command returns aggregate statistics on a DataFrame for each distinct level (factor) in a given column.
 • Aggregate statistics are calculated by splitting the rows of a DataFrame by each factor in a given column into subsets and computing summary statistics for each of the subsets of data.
 • The output from the Aggregate command is a DataFrame that contains only one row per factor in the given column along with the requested summary statistic for each respective other column.

Examples

 > $\mathrm{DF}≔\mathrm{DataFrame}\left(⟨⟨1,1,2,2,2,3⟩|⟨5,3,6,2,1,4⟩|⟨6,5,1,4,9,9⟩|⟨4,2,1,7,8,3⟩⟩,\mathrm{columns}=\left[\mathrm{ID},\mathrm{X1},\mathrm{X2},\mathrm{X3}\right]\right)$
 ${\mathrm{DF}}{≔}\left[\begin{array}{ccccc}{}& {\mathrm{ID}}& {\mathrm{X1}}& {\mathrm{X2}}& {\mathrm{X3}}\\ {1}& {1}& {5}& {6}& {4}\\ {2}& {1}& {3}& {5}& {2}\\ {3}& {2}& {6}& {1}& {1}\\ {4}& {2}& {2}& {4}& {7}\\ {5}& {2}& {1}& {9}& {8}\\ {6}& {3}& {4}& {9}& {3}\end{array}\right]$ (1)

The following returns the mean of all other columns for each distinct level in the column, ID.

 > $\mathrm{Aggregate}\left(\mathrm{DF},\mathrm{ID}\right)$
 $\left[\begin{array}{ccccc}{}& {\mathrm{ID}}& {\mathrm{X1}}& {\mathrm{X2}}& {\mathrm{X3}}\\ {1}& {1}& {4.}& {5.50000000000000}& {3.}\\ {2}& {2}& {3.}& {4.66666666666667}& {5.33333333333333}\\ {3}& {3}& {4.}& {9.}& {3.}\end{array}\right]$ (2)

Adding the columns option controls which columns are returned. Additionally, the tally option returns a tally for each of the factors.

 > $\mathrm{Aggregate}\left(\mathrm{DF},\mathrm{ID},\mathrm{columns}=\left[\mathrm{X1},\mathrm{X2}\right]\right)$
 $\left[\begin{array}{cccc}{}& {\mathrm{ID}}& {\mathrm{X1}}& {\mathrm{X2}}\\ {1}& {1}& {4.}& {5.50000000000000}\\ {2}& {2}& {3.}& {4.66666666666667}\\ {3}& {3}& {4.}& {9.}\end{array}\right]$ (3)
 > $\mathrm{Aggregate}\left(\mathrm{DF},\mathrm{ID},\mathrm{columns}=\left[2,3\right],\mathrm{tally}\right)$
 $\left[\begin{array}{ccccc}{}& {\mathrm{ID}}& {\mathrm{X1}}& {\mathrm{X2}}& {\mathrm{Tally}}\\ {1}& {1}& {4.}& {5.50000000000000}& {2}\\ {2}& {2}& {3.}& {4.66666666666667}& {3}\\ {3}& {3}& {4.}& {9.}& {1}\end{array}\right]$ (4)

The function option allows for the specification of any command that can be applied to a DataSeries. For example, the Statistics:-Median command computes the median for each of the factors of ID.

 > $\mathrm{Aggregate}\left(\mathrm{DF},\mathrm{ID},\mathrm{columns}=\left[\mathrm{X1}\right],\mathrm{function}=\mathrm{Statistics}:-\mathrm{Median}\right)$
 $\left[\begin{array}{ccc}{}& {\mathrm{ID}}& {\mathrm{X1}}\\ {1}& {1}& {4.}\\ {2}& {2}& {2.}\\ {3}& {3}& {4.}\end{array}\right]$ (5)

The Append command appends a column on the right side of the original DataFrame.

 > $\mathrm{Append}\left(\mathrm{DF},⟨3,5,7,9,\mathrm{undefined},13⟩,'\mathrm{label}'='\mathrm{X4}'\right)$
 $\left[\begin{array}{cccccc}{}& {\mathrm{ID}}& {\mathrm{X1}}& {\mathrm{X2}}& {\mathrm{X3}}& {\mathrm{X4}}\\ {1}& {1}& {5}& {6}& {4}& {3}\\ {2}& {1}& {3}& {5}& {2}& {5}\\ {3}& {2}& {6}& {1}& {1}& {7}\\ {4}& {2}& {2}& {4}& {7}& {9}\\ {5}& {2}& {1}& {9}& {8}& {\mathrm{undefined}}\\ {6}& {3}& {4}& {9}& {3}& {13}\end{array}\right]$ (6)

If the function option is given as a list, supplementary arguments can be passed to the applied command.

 > $\mathrm{Aggregate}\left(\mathrm{DF},\mathrm{ID},\mathrm{function}=\left[\mathrm{Statistics}:-\mathrm{Quartile},2,\mathrm{ignore}=\mathrm{true}\right]\right)$
 $\left[\begin{array}{ccccc}{}& {\mathrm{ID}}& {\mathrm{X1}}& {\mathrm{X2}}& {\mathrm{X3}}\\ {1}& {1}& {4.}& {5.50000000000000}& {3.}\\ {2}& {2}& {2.}& {4.}& {7.}\\ {3}& {3}& {4.}& {9.}& {3.}\end{array}\right]$ (7)

The bounds option makes it possible to customize the bins used for the aggregate statistics. In the following, values for the ID column between 1 and 2 are put into the first bin and values 2 and above are put into another.

 > $\mathrm{Aggregate}\left(\mathrm{DF},\mathrm{ID},\mathrm{bounds}=\left[1,2,3\right],\mathrm{tally}\right)$
 $\left[\begin{array}{cccccc}{}& {\mathrm{ID}}& {\mathrm{X1}}& {\mathrm{X2}}& {\mathrm{X3}}& {\mathrm{Tally}}\\ {1}& {1}& {4.}& {5.50000000000000}& {3.}& {2}\\ {2}& {2}& {3.25000000000000}& {5.75000000000000}& {4.75000000000000}& {4}\end{array}\right]$ (8)

Compatibility

 • The DataFrame/Aggregate command was introduced in Maple 2016.