Statistics Enhancements in Maple 16
Maple 16 includes significant enhancements to the Statistics package. These enhancements include:
${}$
$\mathrm{with}\left(\mathrm{Statistics}\right)\:$

Discrete distributions with noninteger values


Starting with version 16, Maple supports discrete distributions that can have noninteger values.

At the market


At a market stall, a vendor charges $5 for taking part in a game of chance. If you participate, you will receive the contents of one of four envelopes filled with small change, each with probability 1/4. The values of the envelopes are $8.24, $3.77, $3.91, and $0.16.
$R\u2254\mathrm{RandomVariable}\left(\mathrm{EmpiricalDistribution}\left(\left[8.24comma;4.12comma;3.91comma;0.16\right]\right)\right)colon;$
We can find the expected amount of money we will receive as $\mathrm{Mean}\left(R\right)$ = ${4.10750000000000}$. This sounds like a fairly unappealing deal.
Suppose furthermore that we are interested in buying peanuts with this money. Because of overhead, buying a few peanuts is more expensive per peanut than if you buy a lot of them. Our peanut supplier will sell us $10{x}^{2}$ grams of peanuts for $$x$ (for $x10$), so for $5, we can get 250g of peanuts. The expected weight of peanuts we will get if we participate in the game is:
$\mathrm{ExpectedValue}\left(10{R}^{2}\right)$ = $\frac{{1001857}}{{4000}}$, or $\mathrm{ExpectedValue}\left(10{R}^{2}comma;apos;\mathrm{numeric}apos;\right)$ = ${250.4642500}$.
Since squaring so strongly benefits the highest outcome, $8.24, the expected payout in peanuts if we take part in the game is essentially equal to the payout without taking part in the game.
When the probabilities for different outcomes are not all the same (or all small multiples of a single value), we can use the new probabilities option to EmpiricalDistribution. In a more refined model, the weight or volume of the envelopes might influence how likely each one is to be picked. For example, suppose the probabilities are as follows:
where the top row gives the values in dollars as before and the bottom row gives the probabilities. (This information is tied to the variable $\mathrm{Probabilities}$ using the data table feature.) Doing the same computations as above, we now see:
$\mathrm{R2}\u2254\mathrm{RandomVariable}\left(\mathrm{EmpiricalDistribution}\left(\mathrm{Probabilities}\left[1\right]comma;apos;\mathrm{probabilities}apos;equals;\mathrm{Probabilities}\left[2\right]\right)\right)colon;$
$\mathrm{Mean}\left(\mathrm{R2}\right)$ = ${4.68400000000000}$
The expected outcome is higher, but still falls short of the price. However, the expected weight of peanuts is now$\mathrm{ExpectedValue}\left(10{\mathrm{R2}}^{2}\right)$ = $\frac{{814979}}{{2500}}$ or $\mathrm{ExpectedValue}\left(10{\mathrm{R2}}^{2}comma;apos;\mathrm{numeric}apos;\right)$ = ${325.9916000}$.
We now see that the payout in peanuts is better if we take part in the game!


Custom distributions


EmpiricalDistribution can be used for all discrete distributions that can assume only finitely many values. All the discrete distributions that can assume infinitely many values that are built into Maple only support integer values. Therefore, to use a discrete distribution that can assume infinitely many values, some or all of which are not integers, we need to define this distribution ourselves, using the custom distribution feature of the Statistics package.
Consider the distribution that can assume all negative powers $p$ of 2, each with probability $p$. That is, the corresponding random variable is $\frac{1}{2}$ with probability $\frac{1}{2}$, $\frac{1}{4}$ with probability $\frac{1}{4}$, and so on. This distribution is represented as follows:
$d\u2254\mathrm{Distribution}\left(\phantom{\rule[0.0ex]{0.0em}{0.0ex}}apos;\mathrm{ProbabilityFunction}apos;equals;\left(p\to p\right)comma;\phantom{\rule[0.0ex]{0.0em}{0.0ex}}apos;\mathrm{DiscreteValueMap}apos;equals;\left(n\to {2}^{n}\right)comma;\phantom{\rule[0.0ex]{0.0em}{0.0ex}}apos;\mathrm{Support}apos;equals;1..\mathit{\infty}\mathit{comma;}\phantom{\rule[0.0ex]{0.0em}{0.0ex}}\mathit{}\mathit{}\mathit{}\mathit{apos;}\mathrm{Type}apos;equals;apos;\mathrm{discrete}apos;\right)colon;$
$R\u2254\mathrm{RandomVariable}\left(d\right)colon;\phantom{\rule[0.0ex]{0.0em}{0.0ex}}$
The DiscreteValueMap and Support properties determine what values the probability distribution can assume; the ProbabilityFunction determines the probability that that value is assumed. For more details, see the DiscreteValueMap help page. We can now compute that
$\mathrm{Mean}\left(R\right)$ = $\frac{{1}}{{3}}$, for example, or $\mathrm{StandardDeviation}\left(R\right)$ = $\frac{{1}}{{21}}{}\sqrt{{14}}$, or $\mathrm{Cumulant}\left(R\,3\right)$ = ${}\frac{{2}}{{945}}$.
Another distribution can be obtained by taking the probabilities $\frac{8\cdot n\cdot \left(nplus;1\right)\cdot \left(nplus;4\right)}{117\cdot {3}^{n}}$, for $nequals;0..\infty$ (a modified negative binomial distribution) and associating the value ${\ⅇ}^{n}$ with the $n$th probability. This is defined as follows:
$\mathrm{d2}\u2254\mathrm{Distribution}\left(\phantom{\rule[0.0ex]{0.0em}{0.0ex}}apos;\mathrm{DiscreteValueMap}apos;equals;\left(n\to {ExponentialE;}^{n}\right)comma;\phantom{\rule[0.0ex]{0.0em}{0.0ex}}apos;\mathrm{Support}apos;equals;0..\mathit{\infty}\mathit{comma;}\phantom{\rule[0.0ex]{0.0em}{0.0ex}}\mathit{}\mathit{}\mathit{}\mathit{apos;}\mathrm{ProbabilityFunction}\mathit{apos;}\mathit{}\mathit{equals;}\mathit{}\left(x\mathit{}\mathit{\to}\mathit{}\frac{8\mathit{\cdot}\mathrm{ln}\left(x\right)\cdot \left(\mathrm{ln}\left(x\right)plus;1\right)\cdot \left(\mathrm{ln}\left(x\right)plus;4\right)}{117\cdot {x}^{\mathrm{ln}\left(3\right)}}\right)comma;\phantom{\rule[0.0ex]{0.0em}{0.0ex}}apos;\mathrm{Type}apos;equals;apos;\mathrm{discrete}apos;\right)colon;$
$\mathrm{R2}\u2254\mathrm{RandomVariable}\left(\mathrm{d2}\right)colon;$
Again, we can compute such quantities as
$\mathrm{Mean}\left(\mathrm{R2}\right)$ = ${}\frac{{16}}{{13}}{}\frac{\left({2}{}{\ⅇ}{}{15}\right){}{\ⅇ}}{{\left({\ⅇ}{}{3}\right)}^{{4}}}$ and $\mathrm{Variance}\left(\mathrm{R2}\right)$ = ${\mathrm{\∞}}$.


Sampling custom discrete distributions


In previous versions, Maple did not support sampling of custom discrete distributions. This feature was added to Maple 16.
${\mathrm{evalf}}_{5}~\left(\mathrm{Sample}\left(R\,10\right)\right)$
$\left[\begin{array}{cccccccccc}{0.50000}& {0.12500}& {0.50000}& {0.25000}& {0.50000}& {0.25000}& {0.25000}& {0.50000}& {0.50000}& {0.031250}\end{array}\right]$
 (1.3.1) 
${\mathrm{evalf}}_{5}~\left(\mathrm{Sample}\left(\mathrm{R2}\,10\right)\right)$
$\left[\begin{array}{cccccccccc}{20.086}& {54.598}& {54.598}& {148.41}& {54.598}& {20.086}& {2.7183}& {2.7183}& {20.086}& {20.086}\end{array}\right]$
 (1.3.2) 



Parameter estimation is more efficient and handles more cases


Maple 16 has much more efficient and robust routines for doing maximum likelihood parameter estimation for many distributions. The following example was sped up by a factor of about 10.
$s\u2254\mathrm{Sample}\left(\mathrm{Normal}\left(1comma;\mathrm{exp}\left(1\right)\right)comma;{10}^{4}\right)colon;$
$\mathrm{CodeTools}:\mathrm{Usage}\left(\mathrm{MaximumLikelihoodEstimate}\left(\mathrm{Normal}\left(1\,\mathrm{sigma;}\right)comma;s\right)\right)semi;$
${2.70664452698181}$
 (2.1) 
${}$
Maple can now also estimate multiple parameters at the same time using maximum likelihood estimation.
$\mathrm{CodeTools}:\mathrm{Usage}\left(\mathrm{MaximumLikelihoodEstimate}\left(\mathrm{Normal}\left(\mathrm{\μ}\,\mathrm{sigma;}\right)comma;s\right)\right)semi;$
$\left[{\mathrm{\μ}}{\=}{}{1.09246434540031}{\,}{\mathrm{\σ}}{\=}{2.70506468306018}\right]$
 (2.2) 
For more information, see the MaximumLikelihoodEstimate help page.


Matrix data sets


The Statistics package in Maple 16 has been updated to better handle Matrix data sets. In previous releases, there were instances where the Statistics package did not always accept Matrix data types. In Maple 16, the commands in the Statistics package have been updated to work with Matrix data sets. These commands work on each column of its input Matrix separately. In addition, Maple 16 now allows you to split your data into submatrices based on the value of one column. As a result you are now able to organize and present your data in different configurations in order to better observe particular trends. For more details see the SplitByColumn help page.
As an example, assume the data below represents some housing data. The first column has the number of bedrooms, the second column has the number of square feet, the third has the price in dollars. This data table corresponds to the variable $\mathrm{HouseSalesData}$.
Using the SplitByColumn command we can easily rearrange the data in terms of the number of bedrooms:
$\mathrm{PerBedroom}\u2254\mathrm{SplitByColumn}\left(\mathrm{HouseSalesData}comma;1\right)$
$\left[\left[\begin{array}{rrr}{2}& {1049}& {81647}\\ {2}& {580}& {59481}\\ {2}& {878}& {96484}\\ {2}& {1100}& {80134}\end{array}\right]{\,}\left[\begin{array}{rrr}{3}& {1130}& {114694}\\ {3}& {907}& {88464}\\ {3}& {1075}& {113341}\\ {3}& {853}& {94666}\\ {3}& {856}& {89412}\end{array}\right]{\,}\left[\begin{array}{rrr}{4}& {1123}& {125236}\\ {4}& {1527}& {127368}\\ {4}& {1040}& {104385}\\ {4}& {1295}& {136603}\\ {4}& {995}& {128007}\\ {4}& {908}& {115707}\end{array}\right]\right]$
 (3.1) 
If we want to know the average area and price for the three bedroom houses, we can find that as follows:
$\mathrm{ThreeBedrooms}\u2254{\mathrm{PerBedroom}}_{2}semi;$
$\left[\begin{array}{rrr}{3}& {1130}& {114694}\\ {3}& {907}& {88464}\\ {3}& {1075}& {113341}\\ {3}& {853}& {94666}\\ {3}& {856}& {89412}\end{array}\right]$
 (3.2) 
$\mathrm{Mean}\left(\mathrm{ThreeBedrooms}\right)\;$
$\left[\begin{array}{ccc}{3.}& {964.200000000000}& {1.00115400000000}{}{{10}}^{{5}}\end{array}\right]$
 (3.3) 
We see that the average three bedroom house has an area of about 960 square feet and costs just over $100100.


Live data plots


A new palette in Maple 16 makes it easy to create and customize statistical plots, including area charts, histograms, pie charts, and scatter plots.
From the Live Data Plots palette, click a plot type to insert this palette item into your document. To display your dataset, replace the placeholder with your dataset. You can customize the plot by clicking on options. For more information, see Live Data Plots in Maple 16.


Variablewidth histograms


Sometimes a phenomenon that lends itself to display using a histogram changes rapidly in a certain region and not so rapidly in a different region. In the region where the phenomenon changes rapidly, you would like to show a very finegrained histogram, but elsewhere that would be overkill and be distracting. In this case, you can use the variablewidth bins feature for histograms, new in Maple 16.
For example, suppose you have data that can come from one of two processes: either from a Beta distribution with parameters 3 and 2.5, or from a normal distribution with mean 2 and standard deviation 5. We have 10000 elements from each.
$\mathrm{s1}\u2254\mathrm{Sample}\left(\mathrm{BetaDistribution}\left(3comma;2.5\right)comma;{10}^{2}\right)colon;$
$\mathrm{s2}\u2254\mathrm{Sample}\left(\mathrm{NormalDistribution}\left(2comma;3\right)comma;{10}^{2}\right)colon;\phantom{\rule[0.0ex]{0.0em}{0.0ex}}$
$s\u2254\mathrm{Join}\left(\left[\mathrm{s1}comma;\mathrm{s2}\right]\right)colon;$
The default histogram is too coarse:
$\mathrm{Histogram}\left(s\right)\;$
But a histogram with a much finer bin width shows too many empty bins:
$\mathrm{Histogram}\left(s\,\mathrm{binwidth}equals;0.15\right)semi;$
The following command uses wider bins where there are fewer points. (The heights of the rectangles are proportional to the density of points in the given bin, and the total area of all rectangles is 1.)
$\mathrm{Histogram}\left(s\,\mathrm{binbounds}equals;\mathrm{proportional}\right)$
For more information, see Histogram.


ScatterPlot3D


•

The ScatterPlot3D command provides functionality to plot a surface from an mx3 Array or Matrix representing points in threedimensional space. The surface is a smoothed approximation, generated using the lowess algorithm. Considering each row of the data Matrix as a point in $x$$y$$z$ space then the first two entries of each row represent a point on the $x$$y$ plane (independent data) while the third entry of each row represents the $z$coordinate (dependent data).

•

The data in the first two entries of each row does not need to form a regular grid in the ($x$$y$) plane.

•

The following example constructs data by adding noise to a function ($z$value) in the first two ($x$ and $y$) dimensions.

>

X := Sample(Uniform(50,50),175):

>

Y := Sample(Uniform(50,50),175):

>

Zerror := Sample(Normal(0,100),175):

>

Z := Array(1..175,(i)>(sin(Y[i]/20)*(X[i]6)^2+(Y[i]7)^2+Zerror[i])):

>

XYZ := Matrix([[X],[Y],[Z]],datatype=float[8])^%T;

${\mathrm{XYZ}}{:=}\left[\begin{array}{c}{\mathrm{175\; x\; 3}}{\mathrm{Matrix}}\\ {\mathrm{Data\; Type:}}{{\mathrm{float}}}_{{8}}\\ {\mathrm{Storage:}}{\mathrm{rectangular}}\\ {\mathrm{Order:}}{\mathrm{Fortran\_order}}\end{array}\right]$
 (6.1) 
>

ScatterPlot3D(XYZ, axes=box, orientation=[20,0,0]);

>

ScatterPlot3D(XYZ, lowess, grid=[25,25], axes=box, orientation=[20,70,0]);



Pie chart improvements


In Maple 16, you can now create threedimensional pie charts and annular pie charts.
Additional improvements include new default coloring of pie charts. When you specify ranges for pie chart colors, the two range endpoint colors are plotted opposite each other in the pie chart. The color gradient for the pie chart changes clockwise. The colors in the pie chart are kept within the same hue.
In addition, labels of pie slices are automatically colored for better contrast.
>

dataset := ["A" = 5, "B" = 4, "C" = 3, "D" = 2, "E" = 3, "F" = 4, "G" = 5]:

>

Statistics:PieChart(dataset, color = "CornflowerBlue" .. "DarkBlue", annular = true, render3d = true);






${}$
