Application Center - Maplesoft

App Preview:

Statistics Enhancements in Maple 16

You can switch back to the summary page by clicking here.

Learn about Maple
Download Application




Statistics Enhancements in Maple 16

 

Statistical computations in Maple combine the ease of working in a high-level, interactive environment with a very large and powerful set of algorithms. Large data sets can be handled efficiently with 35 built-in statistical distributions, sampling, estimations, data smoothing, hypothesis testing, and visualization algorithms. In addition, integration with the Maple symbolic engine means that you can easily specify custom distributions by combining existing distributions or simply by giving a formula for the probability or cumulative distribution function.

Maple 16 includes significant enhancements to the Statistics package. These enhancements include:

• 

    Discrete distributions, which are important in many areas from game theory to algorithm analysis, are significantly enhanced, with support for non-integer values as well as sampling of custom discrete distributions.

• 

    Maximum likelihood estimation now allows for multiple parameters and is significantly faster.

• 

    Improved support for matrix data sets makes it is easier to split data into subsets based on particular criteria, enhancing your ability to analyze data and identify patterns.

• 

    Statistical visualization is easier than ever before. In addition to the new Live Data Plots, enhancements like variable bin-width histograms and new options for pie charts provide you with extra control over how data is presented.

NULL

with(Statistics)

Discrete distributions with noninteger values

 

New in Maple 16, Maple supports discrete distributions that can have noninteger values.

Example: At the market

 

At a market stall, a vendor charges $5 for taking part in a game of chance. If you participate, you will receive the contents of one of four envelopes filled with small change, each with probability 1/4. The values of the envelopes are $8.24, $3.77, $3.91, and $0.16.

R := RandomVariable(EmpiricalDistribution([8.24, 4.12, 3.91, .16]))

We can find the expected amount of money we will receive as Mean(R) = HFloat(4.1075). This sounds like a fairly unappealing deal.

Suppose furthermore that we are interested in buying peanuts with this money. Because of overhead, buying a few peanuts is more expensive per peanut than if you buy a lot of them. Our peanut supplier will sell us 10*x^2 grams of peanuts for $x (for x < 10), so for $5, we can get 250g of peanuts. The expected weight of peanuts we will get if we participate in the game is:

ExpectedValue(10*R^2) = 1001857/4000, or ExpectedValue(10*R^2, 'numeric') = 250.4642500.

Since squaring so strongly benefits the highest outcome, $8.24, the expected payout in peanuts if we take part in the game, is essentially equal to the payout without taking part in the game.

When the probabilities for different outcomes are not all the same (or all small multiples of a single value), we can use the new probabilities option to EmpiricalDistribution . In a more refined model, the weight or volume of the envelopes might influence how likely each one is to be picked. For example, suppose the probabilities are as follows:

where the top row gives the values in dollars as before and the bottom row gives the probabilities. (This information is tied to the variable Probabilities using the data table feature.) Doing the same computations as above, we now see:

R2 := RandomVariable(EmpiricalDistribution(Probabilities[1], 'probabilities' = Probabilities[2]))

Mean(R2) = HFloat(4.684)

The expected outcome is higher, but still falls short of the price. However, the expected weight of peanuts is nowExpectedValue(10*R2^2) = 814979/2500 or ExpectedValue(10*R2^2, 'numeric') = 325.9916000.

We now see that the payout in peanuts is better if we take part in the game!

Custom distributions

 

EmpiricalDistribution can be used for all discrete distributions that can assume only finitely many values. All the discrete distributions that can assume infinitely many values that are built into Maple only support integer values. Therefore, to use a discrete distribution that can assume infinitely many values, some or all of which are not integers, we need to define this distribution ourselves, using the custom distribution feature of the Statistics package.

Consider the distribution that can assume all negative powers p of 2, each with probability p. That is, the corresponding random variable is 1/2 with probability 1/2, 1/4 with probability 1/4, and so on. This distribution is represented as follows:

d := Distribution('ProbabilityFunction' = (proc (p) options operator, arrow; p end proc), 'DiscreteValueMap' = (proc (n) options operator, arrow; 2^(-n) end proc), 'Support' = 1 .. infinity, 'Type' = 'discrete')

R := RandomVariable(d)

The DiscreteValueMap and Support properties determine what values the probability distribution can assume; the ProbabilityFunction determines the probability that that value is assumed. For more details, see the DiscreteValueMap help page. We can now compute that Mean(R) = 1/3, for example, or StandardDeviation(R) = (1/21)*14^(1/2), or Cumulant(R, 3) = -2/945.

Another distribution can be obtained by taking the probabilities 8*n*(n+1)*(n+4)/(117*3^n), for n = 0 .. infinity (a modified negative binomial distribution) and associating the value exp(n) with the nth probability. This is defined as follows:

d2 := Distribution('DiscreteValueMap' = (proc (n) options operator, arrow; exp(n) end proc), 'Support' = 0 .. infinity, 'ProbabilityFunction' = (proc (x) options operator, arrow; (8/117)*ln(x)*(ln(x)+1)*(ln(x)+4)/x^ln(3) end proc), 'Type' = 'discrete')

R2 := RandomVariable(d2)

Again, we can compute such quantities as Mean(R2) = -(16/13)*(2*exp(1)-15)*exp(1)/(exp(1)-3)^4 and Variance(R2) = infinity.

Sampling custom discrete distributions

 

In previous versions, Maple did not support sampling of custom discrete distributions. This feature was added to Maple 16.

`~`[evalf[5]](Sample(R, 10))

Vector[row](10, {(1) = .50000, (2) = .12500, (3) = .50000, (4) = .25000, (5) = .50000, (6) = .25000, (7) = .25000, (8) = .50000, (9) = .50000, (10) = 0.31250e-1})

`~`[evalf[5]](Sample(R2, 10))

Vector[row](10, {(1) = 20.086, (2) = 54.598, (3) = 54.598, (4) = 148.41, (5) = 54.598, (6) = 20.086, (7) = 2.7183, (8) = 2.7183, (9) = 20.086, (10) = 20.086})

Parameter estimation is more efficient and handles more cases

 

Maple 16 has much more efficient and robust routines for doing maximum likelihood parameter estimation for many distributions. The following example was sped up by a factor of about 10.

s := Sample(Normal(-1, exp(1)), 10^4)

CodeTools:-Usage(MaximumLikelihoodEstimate(Normal(-1, sigma), s))

HFloat(2.706644526981814)

Maple can now also estimate multiple parameters at the same time using maximum likelihood estimation.

CodeTools:-Usage(MaximumLikelihoodEstimate(Normal(mu, sigma), s))

[mu = HFloat(-1.0924643454003085), sigma = HFloat(2.7050646830601797)]

Matrix data sets

 

The Statistics package in Maple 16 has been updated to better handle Matrix data sets.  In previous releases, there were instances where the Statistics package did not always accept Matrix data types. In Maple 16, the commands in the Statistics package have been updated to work with Matrix data sets. These commands work on each column of its input Matrix separately. In addition, Maple 16 now allows you to split your data into submatrices based on the value of one column. As a result you are now able to organize and present your data in different configurations in order to better observe particular trends.  

 

As an example, assume the data below represents some housing data. The first column has the number of bedrooms, the second column has the number of square feet, the third has the price in dollars. This data table corresponds to the variable HouseSalesData.

 

Using the SplitByColumn command we can easily rearrange the data in terms of the number of bedrooms:

PerBedroom := SplitByColumn(HouseSalesData, 1)

[Matrix(4, 3, {(1, 1) = 2, (1, 2) = 1049, (1, 3) = 81647, (2, 1) = 2, (2, 2) = 580, (2, 3) = 59481, (3, 1) = 2, (3, 2) = 878, (3, 3) = 96484, (4, 1) = 2, (4, 2) = 1100, (4, 3) = 80134}), Matrix(5, 3, {(1, 1) = 3, (1, 2) = 1130, (1, 3) = 114694, (2, 1) = 3, (2, 2) = 907, (2, 3) = 88464, (3, 1) = 3, (3, 2) = 1075, (3, 3) = 113341, (4, 1) = 3, (4, 2) = 853, (4, 3) = 94666, (5, 1) = 3, (5, 2) = 856, (5, 3) = 89412}), Matrix(6, 3, {(1, 1) = 4, (1, 2) = 1123, (1, 3) = 125236, (2, 1) = 4, (2, 2) = 1527, (2, 3) = 127368, (3, 1) = 4, (3, 2) = 1040, (3, 3) = 104385, (4, 1) = 4, (4, 2) = 1295, (4, 3) = 136603, (5, 1) = 4, (5, 2) = 995, (5, 3) = 128007, (6, 1) = 4, (6, 2) = 908, (6, 3) = 115707})]

If we want to know the average area and price for the three bedroom houses, we can find that as follows:

ThreeBedrooms := PerBedroom[2]

Matrix(5, 3, {(1, 1) = 3, (1, 2) = 1130, (1, 3) = 114694, (2, 1) = 3, (2, 2) = 907, (2, 3) = 88464, (3, 1) = 3, (3, 2) = 1075, (3, 3) = 113341, (4, 1) = 3, (4, 2) = 853, (4, 3) = 94666, (5, 1) = 3, (5, 2) = 856, (5, 3) = 89412})

Mean(ThreeBedrooms)

Vector[row](3, {(1) = 3., (2) = 964.200000000000, (3) = 100115.4000})

We see that the average three bedroom house has an area of about 960 square feet and costs just over $100100.

Statistical Visualization

 

Live data plots

 

A new palette in Maple 16 makes it easy to create and customize statistical plots, including area charts, histograms, pie charts, and scatter plots. For more information, see Live Data Plots in Maple 16 .

 

 

 

Variable-width histograms

 

Sometimes a phenomenon that lends itself to display using a histogram changes rapidly in a certain region and not so rapidly in a different region. In the region where the phenomenon changes rapidly, you would like to show a very fine-grained histogram, but elsewhere that would be overkill and be distracting. In this case, you can use the variable-width bins feature for histograms, new in Maple 16.

For example, suppose you have data that can come from one of two processes: either from a Beta distribution with parameters 3 and 2.5, or from a normal distribution with mean -2 and standard deviation 5. We have 10000 elements from each.

s1 := Sample(BetaDistribution(3, 2.5), 10^2)

s2 := Sample(NormalDistribution(-2, 3), 10^2)

s := Join([s1, s2])

The default histogram is too coarse:

Histogram(s)

But a histogram with a much finer bin width shows too many empty bins:

Histogram(s, binwidth = .15)

The following command uses wider bins where there are fewer points. (The heights of the rectangles are proportional to the density of points in the given bin, and the total area of all rectangles is 1.)

Histogram(s, binbounds = proportional)

ScatterPlot3D

 
• 

The ScatterPlot3D command provides functionality to plot a surface from an mx3 Array or Matrix representing points in three-dimensional space. The surface is a smoothed approximation, generated using the lowess algorithm. Considering each row of the data Matrix as a point in x;-y;-z; space then the first two entries of each row represent a point on the x;-y; plane (independent data) while the third entry of each row represents the z;-coordinate (dependent data).

• 

The data in the first two entries of each row does not need to form a regular grid in the (x-y) plane.

• 

The following example constructs data by adding noise to a function (z-value) in the first two (x and y) dimensions.

X := Sample(Uniform(-50,50),175):

Y := Sample(Uniform(-50,50),175):

Zerror := Sample(Normal(0,100),175):

Z := Array(1..175,(i)->-(sin(Y[i]/20)*(X[i]-6)^2+(Y[i]-7)^2+Zerror[i])):

XYZ := Matrix([[X],[Y],[Z]],datatype=float[8])^%T;

XYZ := Vector(4, {(1) = ` 175 x 3 `*Matrix, (2) = `Data Type: `*float[8], (3) = `Storage: `*rectangular, (4) = `Order: `*Fortran_order})

ScatterPlot3D(XYZ, axes=box, orientation=[20,0,0]);

ScatterPlot3D(XYZ, lowess, grid=[25,25], axes=box, orientation=[20,70,0]);

Pie chart improvements

 

In Maple 16, you can now create three-dimensional pie charts and annular pie charts.

Additional improvements include new default coloring of pie charts.  When you specify ranges for pie chart colors, the two range endpoint colors are plotted opposite each other in the pie chart. The color gradient for the pie chart changes clockwise. The colors in the pie chart are kept within the same hue.

In addition, labels of pie slices are automatically colored for better contrast.

dataset := ["A" = 5, "B" = 4, "C" = 3, "D" = 2, "E" = 3, "F" = 4, "G" = 5]:

Statistics:-PieChart(dataset, color = "CornflowerBlue" .. "DarkBlue", annular = true, render3d = true);

 

Legal Notice: © Maplesoft, a division of Waterloo Maple Inc. 2012. Maplesoft and Maple are trademarks of Waterloo Maple Inc. This application may contain errors and Maplesoft is not liable for any damages resulting from the use of this material. This application is intended for non-commercial, non-profit use only. Contact Maplesoft for permission if you wish to use this application in for-profit activities.