Update to the stats Package
|
Description
|
|
•
|
The statistics package has been completely redesigned for Maple V Release 3. This help page was written in order to assist the conversion from the usage of Maple V Release 2 or earlier to Maple V Release 3. The examples appearing here are conversions from the Maple V Library Reference Manual of 1991. If you were not using the previous stats package, this document can still provide you with useful examples.
|
•
|
The following enhancements have been made to the stats package. The data structure has been changed from the statistical matrix to Maple lists. Missing data are now supported. Classes (data in ranges) are now supported. Weighted data are now supported. Statistical distributions are objects in their own right. The notation for statistical distributions is consistent throughout the package. In particular, there are no differences in notation between the random subpackage and the statevalf subpackage. Many new distributions are now supported. Many new descriptive statistics functions and data transformation functions are now in the package. There is only one function to do least square fit of equations (there used to be three), and the new notation is more intuitive. There are also new functions for creating statistical graphics.
|
•
|
The following describes how to translate programs that used the previous version of the stats package to the current notation.
|
•
|
Note that the old package would tend to apply operations on columns of a matrix of data. In particular, on lists of lists, the old behavior was to apply the required operation (such as variance) on the list made of the first item of each internal list, then on the list made of the second items, etc. The natural thing to do using the new package is to use map() the operation on the list of lists. So the computation is now across the rows.
|
•
|
The old function stats[addrecord]() is now obsolete since the data structure has been changed from a statistical matrix to Maple's list.
|
> x:=[1,2]; y:=[2,3];
|
x := [1, 2]
|
y := [2, 3]
|
> x:=[op(x),3]; y:=[op(y), 4];
|
x := [1, 2, 3]
|
y := [2, 3, 4]
|
> x:='x': y:='y':
|
|
|
•
|
The function stats[average] is replaced by stats[describe,mean]. Data are now supplied as a list.
|
> k:=array([[1,2,2],[5,5,5],[11,7,8]]);
|
[ 1 2 2 ]
|
[ ]
|
k := [ 5 5 5 ]
|
[ ]
|
[ 11 7 8 ]
|
> map(stats[describe,mean],convert(linalg[transpose](k),listlist));
|
[17/3, 14/3, 5]
|
> stats[describe,mean](");
|
46/9
|
|
|
•
|
stats[ChiSquare](F, v) is replaced by stats[statevalf, icdf, chisquare[v]](F).
|
> stats[statevalf,icdf,chisquare[5]](0.1);
|
1.610307987
|
|
|
•
|
The function stats[correlation](list1, list2) is replaced by stats[describe, linearcorrelation]([list1, list2]).
|
> stats[describe,linearcorrelation]([1,2,3,4,5,6,7],[1,2,5,8,10,13,16]);
|
1/2
|
6/109 327
|
> evalf(");
|
.9954022745
|
|
|
•
|
The function stats[covariance](list1, list2) is replaced by stats[describe,covariance]([list1,list2]). A bug in the definition of covariance has been fixed.
|
> stats[describe,covariance]([1,2,3],[1.2, 2.3, 3.7]);
|
bytes used=32823536, alloc=1834672, time=151.20
|
.8333333333
|
> stats[describe,mean]([1.2, 2.3, 3.7]);
|
2.400000000
|
> ( (1-2)*(1.2-2.4) + (2-2)*(2.3-2.4) + (3-2)*(3.7-2.4) )/3;
|
.8333333333
|
Old behavior: INCORRECT:
|
> stats[describe,mean]([1,2,3]),stats[describe,mean]([1.2, 2.3, 3.7]);
|
2, 2.400000000
|
> ((1*1.2 + 2*2.3 + 3*3.7) - ( 3* 2*2.4))/2;
|
1.250000000
|
|
|
•
|
The data representation via stats/statistical matrix is now obsolete, use data as Maple lists instead. To import data, use the readline and sscanf facilities. The function stats[importdata] can also be useful.
|
•
|
The stats/statistical distributions have been revised and the interface has changed. Since distributions can be considered as Maple functions in their own right, their evaluation at symbolic values remain unevaluated. To get numerical values, use the stats[statevalf] subpackage. The definitions of the various distributions are given in the documentation for stats[distributions].
|
> stats[statevalf,pdf,exponential[5]](6);
|
-12
|
.4678811485*10
|
> evalf(5*exp(-30));
|
-12
|
.4678811485*10
|
|
|
•
|
The function stats[evalstat] is obsolete, since the data structure changed. A similar facility is given by stats[transform, multiapply].
|
> x:=[1,3];
|
x := [1, 3]
|
> y:=[2,4];
|
y := [2, 4]
|
> k:=stats[transform,multiapply[(x,y)->5*x+y^2]]([ x, y ] );
|
k := [9, 31]
|
> statlists:=[x1=[1,2,3],x2=[4,5,6]];
|
statlists := [x1 = [1, 2, 3], x2 = [4, 5, 6]]
|
> stats[describe,mean](subs(statlists,x1));
|
2
|
> stats[describe,linearcorrelation](op(subs(statlists,[x1,x2])));
|
1
|
|
|
•
|
The function stats[Exponential](lambda, bound) is replaced by stats[statevalf, pdf, exponential[lambda]](bound). Note that the symbolic values no longer expressed in terms of the exp function.
|
> temp:=stats[statevalf,pdf,exponential[5]]:
|
> temp(4);
|
-7
|
.1030576811*10
|
> evalf(5*exp(-20));
|
-7
|
.1030576811*10
|
> stats[statevalf,pdf,exponential[5]](4);
|
-7
|
.1030576811*10
|
|
|
•
|
The function stats[Fdist](F,n,m) is replaced by stats[statevalf, icdf, fratio[n,m]](F).
|
> stats[statevalf, icdf, fratio[2, 5]](0.9);
|
3.779716079
|
|
|
•
|
The function stats[Ftest](x, n, m) is replaced by stats[statevalf, cdf, fratio[n,m]](x).
|
> stats[statevalf, cdf, fratio[3,5]](32);
|
.0010910074
|
|
|
•
|
The function stats[getkey] is obsolete since the data structure has changed. No replacement is required.
|
•
|
The function stats[linregress](yvals, xvals) is replaced by stats[fit,leastsquare[[x,y]]]([xvals,yvals]). Note that the function now returns an equation, and that the coefficients are not necessarily floating points.
|
> x:='x': y:='y':
|
> fit[leastsquare[[x,y]]]([[10,15,17,19],[3,4,5,6]]);
|
bytes used=33117120, alloc=1834672, time=152.42
|
bytes used=33128064, alloc=1834672, time=152.80
|
79 58
|
y = - --- + --- x
|
179 179
|
> evalf(");
|
y = - .4413407821 + .3240223464 x
|
|
|
•
|
The function stats[median](data) is replaced by stats[describe,median](data) when data is a list. Note that the default behavior returns the th element for an even number of elements.
|
> stats[describe,median]([3,4,5,6,7]);
|
5
|
> dat:=array([[1,2,3,4],[5,6,7,8],[9,2,4,1]]);
|
[ 1 2 3 4 ]
|
[ ]
|
dat := [ 5 6 7 8 ]
|
[ ]
|
[ 9 2 4 1 ]
|
> map(stats[describe,median],convert(linalg[transpose](dat), listlist));
|
[5, 2, 4, 4]
|
|
|
•
|
The function stats[mode](data) is replaced by stats[describe,mode](data) when data is a list. Note that the replacement function works differently for ranges (the behavior is that of the Schaum's outline: Statistics 2nd edition).
|
> dat:=[1,1,2,2,2,2,5,5,9,9,9,9];
|
dat := [1, 1, 2, 2, 2, 2, 5, 5, 9, 9, 9, 9]
|
> partition:=[1..3,3..5,5..7,7..9,9..11];
|
partition := [1 .. 3, 3 .. 5, 5 .. 7, 7 .. 9, 9 .. 11]
|
> stats[transform,tallyinto](dat,partition);
|
[Weight(3 .. 5, 0), Weight(5 .. 7, 2), Weight(9 .. 11, 4), Weight(1 .. 3, 6),
|
Weight(7 .. 9, 0)]
|
> stats[describe,mode](");
|
2
|
to get same behavior as old stats package:
|
> stats[transform,tallyinto](dat,partition);
|
[Weight(3 .. 5, 0), Weight(5 .. 7, 2), Weight(9 .. 11, 4), Weight(1 .. 3, 6),
|
Weight(7 .. 9, 0)]
|
this replaces ranges by their start
|
> eval(subs(Weight=proc(r,w) Weight(op(1,r),w) end, "));
|
[Weight(3, 0), Weight(5, 2), Weight(9, 4), Weight(1, 6), Weight(7, 0)]
|
find modal class
|
> stats[describe,mode](");
|
1
|
get full range
|
> map(proc(d, start) if op(1,op(1,d))=start then op(1,d) else NULL fi end,
|
> """,");
|
[1 .. 3]
|
> dat:=array([[1,2,3,4],[1,2,3,3],[1,2,3,2]]);
|
[ 1 2 3 4 ]
|
[ ]
|
dat := [ 1 2 3 3 ]
|
[ ]
|
[ 1 2 3 2 ]
|
> map(x-> [stats[describe,mode](x)], convert(linalg[transpose](dat),listlist));
|
[[1], [2], [3], [2, 3, 4]]
|
|
|
•
|
The function stats[multregress] is replaced by stats[fit,leastsquare]. Refer to the documentation for stats[fit,leastsquare] on how to set the equation. Note that the answer is given as an equation and is not necessarily in floating point format.
|
> y:='y':
|
> dat:=array([[1,4,2,7,1,6,8],
|
> [6,9,1,12,5,3,7], [9,3,4,7,1,8,6], [4,7,1/2,4/3,6/7,5,1],
|
> [9,1,4/3,1,7,6,2],[2,6,1,8,9,3,4],[11,14,2,7,1,3,8]]);
|
[ 1 4 2 7 1 6 8 ]
|
[ ]
|
[ 6 9 1 12 5 3 7 ]
|
[ ]
|
[ 9 3 4 7 1 8 6 ]
|
[ ]
|
dat := [ 4 7 1/2 4/3 6/7 5 1 ]
|
[ ]
|
[ 9 1 4/3 1 7 6 2 ]
|
[ ]
|
[ 2 6 1 8 9 3 4 ]
|
[ ]
|
[ 11 14 2 7 1 3 8 ]
|
> stats[fit,leastsquare[ [y,x1,x2,x3,x4,x5,x6],
|
> x5=a*x2+b*x4+c*x6+d,{a,b,c,d}]]( convert( linalg[transpose](dat), listlist));
|
bytes used=33463832, alloc=1834672, time=154.23
|
bytes used=33478112, alloc=1834672, time=154.72
|
25505003 13507963 22555349 530730959
|
x5 = -------- x2 - --------- x4 - -------- x6 + ---------
|
17481458 104888748 52444374 104888748
|
> evalf(");
|
x5 = 1.458974589 x2 - .1287837185 x4 - .4300813849 x6 + 5.059941787
|
|
|
•
|
The function stats[N](x,m,v) is replaced by stats[statevalf,cdf,normald[m,sqrt(v)](x). Note that the second parameter is now the standard deviation and not the variance.
|
> stats[statevalf,cdf,normald](2.44);
|
.9926563691
|
> stats[statevalf,cdf,normald[2,sqrt(5)]](2.44);
|
.5779977938
|
|
|
•
|
The function stats[projection] has been removed.
|
•
|
The function stats[putkey] is obsolete since the data structure has changed. Use the symbolic facilities of Maple.
|
•
|
The function stats[Q](x) is replaced by stats[statevalf, cdf, normald](x).
|
> stats[statevalf, cdf, normald](0);
|
.5000000000
|
> stats[statevalf, cdf, normald](1);
|
.1586552540
|
|
|
•
|
The interface to getting random numbers with prescribed statistical distributions has been changed. The replacement functions no longer return generators thus avoiding the need to obtain numbers in a two step manner.
|
First get the generator, then generate the numbers. There is an option to specify that generators are to be returned. The number of digits in the numbers returned by those generators can also be specified. Many random numbers can be obtained at once by specifying the count as the first parameter to the function. The number of digits can be varied by changing the Digits environment variable (see the information about Digits and about environment variables).
Note that for values of Digits higher than 12, it will probably be necessary to specify the underlying uniform number generator (see the information for stats[random] for more details.
The function stats[RandBeta](a,b) is replaced, for most applications, by stats[random, beta[a,b]](). The function stats[RandExponential](u)is replaced, for most applications, by stats[random, exponential[u]](). The function stats[RandFdist](v1, v2) is replaced, for most applications, by stats[random, fratio[v1,v2]](). The function stats[RandGamma](a) is replaced, for most applications , by stats[random, gamma[a]](). The function stats[RandNormal](u,s) is replaced, for most applications, by stats[random, normald[u,s]](). The function stats[RandPoisson](lambda) is replaced, for most applications, by stats[random, poisson[lambda]]. The function stats[RandStudentsT](v) is replaced, for most applications, by stats[random, studentst[v]](). The function stats[RandUniform](a..b) is replaced, for most applications, by stats[random, uniform[a,b]](). The function stats[RandChiSquare](v) is replaced, for most applications, by stats[random, chisquare[v]]().
For all the preceding distributions, if a generator is needed, give the keyword generator, optionally indexed by the number of digits, as the count parameter.
> stats[random,beta[1,2]](5);
|
.4665221875, .4727415895, .3724472020, .4696548594, .4676474494
|
> stats[random,exponential[3]](5);
|
.5149787523, .06848383666, .06644078099, .02050161526, .1548430514
|
> stats[random,fratio[1,3]](5);
|
1.077319107, 1.648879054, .06197577061, .1918622691, .5091468750
|
> stats[random,gamma[1]](5);
|
.8528857837, .9925261897, .7168040779, .4107219862, .5627456161
|
> stats[random,normald[0,1]](5);
|
.6783591266, -2.931033996, -.7845544037, .9920137252, -1.453766348
|
> stats[random,poisson[.9]](5);
|
1.0, 0, 1.0, 1.0, 1.0
|
> stats[random,studentst[2]](5);
|
-3.029231801, 1.110079936, .8578918497, 9.662401443, .07595782169
|
> stats[random,uniform[1,2]](5);
|
1.915776997, 1.835809125, 1.729175334, 1.663234020, 1.658477960
|
> stats[random,chisquare[3]](5);
|
2.385907282, .6492710560, .8710682352, 4.224988686, 10.70706216
|
> A:=stats[random,chisquare[3]](generator[8]):'A()'$5;
|
.92996968, 3.1100056, 3.5970376, 3.1406916, 8.6985432
|
|
|
•
|
The function stats[regression] is replaced by stats[fit,leastsquare]. Refer to the documentation of stats[fit,leastsquare] for the required specification format. Note that the result is an equation and that it is not necessarily in floating point format.
|
> dat:=array([[1/2,1/5],[4,1],[6,1],[1,7]]);
|
[ 1/2 1/5 ]
|
[ ]
|
[ 4 1 ]
|
dat := [ ]
|
[ 6 1 ]
|
[ ]
|
[ 1 7 ]
|
> stats[fit,leastsquare[ [y,x], y=a+b*x+c*x^2+d*x^3, {a,b,c,d}] ]
|
> ( convert( linalg[transpose](dat), listlist) );
|
bytes used=34344752, alloc=1834672, time=158.17
|
bytes used=34359032, alloc=1834672, time=158.67
|
661 /229 / 755 2 3
|
y = - --- - 7/5 d + |--- + 43/5 d| x + |- --- - 41/5 d| x + d x
|
816 34 / 816 /
|
> evalf(");
|
y = - .8100490196 - 1.400000000 d + (6.735294118 + 8.600000000 d) x
|
2 3
|
+ ( - .9252450980 - 8.200000000 d) x + d x
|
The presence of the unevaluated parameter indicates that the
|
least square equations do not completely determine all parameters
|
for this set of data.
|
Note: example in text tries to fit a polynomial that has 4 parameters
|
with data that has only three different x values!
|
|
|
•
|
The function stats[removekey] is obsolete, since the data structure has changed. Use the symbolic capabilities of Maple.
|
•
|
The function stats[Rsquared](list1, list2) is replaced by (stats[describe,linearcorrelation]([list1,list2]))^2;
|
> stats[describe,linearcorrelation]([1,2,3,4,5],
|
> [3.5, 9.7, 2.3, 10.4, 11.66])^2;
|
.3936148818
|
> dat:=array([ [2,5,7],[3,6,8], [4,7,9] ]);
|
[ 2 5 7 ]
|
[ ]
|
dat := [ 3 6 8 ]
|
[ ]
|
[ 4 7 9 ]
|
> linalg[transpose](");
|
[ 2 3 4 ]
|
[ ]
|
[ 5 6 7 ]
|
[ ]
|
[ 7 8 9 ]
|
> convert(",listlist);
|
[[2, 3, 4], [5, 6, 7], [7, 8, 9]]
|
> (stats[describe,linearcorrelation](op(2,"),op(3,")))^2;
|
1
|
|
|
•
|
The function stats[sdev](x) is replaced by stats[describe, standarddeviation[1]](x) when x is a list. Note the presence of the number 1. This is for sample standard deviation; for population standard deviation use 0.
|
> stats[describe, standarddeviation[1]]( [3,4,5,6,7,8,100] );
|
1/2 1/2
|
2/3 959 3
|
> dat:=array([[3,4],[9.5,6.7],[0.001,0.005]]);
|
[ 3 4 ]
|
[ ]
|
dat := [ 9.5 6.7 ]
|
[ ]
|
[ .001 .005 ]
|
> map( stats[describe,standarddeviation[1]], convert( linalg[transpose](dat),
|
> listlist) );
|
[4.855838445, 3.368309417]
|
|
|
•
|
The function stats[serr] is no longer available. The old functionality can be obtained as shown in the following example
|
> serror:=proc(data) stats[describe,standarddeviation[1]](data)/
|
> sqrt(stats[describe,count](data)) end proc:
|
> serror([3,4,0.001,9.9]);
|
2.073625062
|
> dat:=array([[2,3],[6.7,31],[0.001,0.005]]);
|
[ 2 3 ]
|
[ ]
|
dat := [ 6.7 31 ]
|
[ ]
|
[ .001 .005 ]
|
> map(serror,convert(linalg[transpose](dat),listlist));
|
1/2 1/2
|
[1.146351717 3 , 5.698700386 3 ]
|
|
|
•
|
The function stats[statplot] has been replaced by stats[statplots,scatter2d] and the use of Maple's other plotting functions. Refer to the documentation for plot, plots, and stats[statplots].
|
•
|
The function stats[StudentsT](F, v) is replaced by stats[statevalf, icdf, studentst[v]](F). The 'area' form, stats[StudentsT]( F, v, 'area') is replaced by stats[statevalf, cdf, studentst[v]](F).
|
> stats[statevalf, icdf, studentst[4]](0.75);
|
.7406970841
|
> stats[statevalf, cdf, studentst[4]](0.75);
|
.7525202833
|
|
|
•
|
The function stats[Uniform](a,b) is replaced by stats[statevalf, pdf, uniform[a,b]]
|
> J:=stats[statevalf, pdf, uniform[5,7]]:
|
> J(6);
|
.5000000000
|
> J(10);
|
0
|
|
|
•
|
The function stats[variance](x) is replaced by stats[describe, variance[1]](x) when x is a list. Note the number 1; this is for sample variance, for population variance use 0.
|
> stats[describe,variance[1]]([3,4,5,6]);
|
5/3
|
> dat:=array( [[ 3,5], [6.7, 8.9], [0.001, 0.005] ] );
|
[ 3 5 ]
|
[ ]
|
dat := [ 6.7 8.9 ]
|
[ ]
|
[ .001 .005 ]
|
> map(stats[describe,variance[1]], convert(linalg[transpose](dat),
|
> listlist));
|
[11.26010033, 19.88017500]
|
|
|
|
|