Application Center - Maplesoft

# Fast Maple algorithms for k-statistics, polykays and their multivariate generalization

You can switch back to the summary page by clicking here.

This Maple worksheet accompanies the papers:

Di Nardo E., G. Guarino, D. Senato (2007), A new method for fast computing unbiased estimators of cumulants. In press Statistics and Computing.

Fast Maple algorithms for k-statistics, polykays
and their multivariate generalization

E. Di Nardo*
elvira.dinardo@unibas.it
http://www.unibas.it/utenti/dinardo/home.html;
Tel: +39 0971205890, Fax: +39 0971205896

G. Guarino**
giuseppe.guarino@asl2.potenza.it

D. Senato*
domenico.senato@unibas.it

* Dipartimento di Matematica e Informatica, Universit? degli Studi della Basilicata,Viale dell'Ateneo Lucano n.10, 85100 Potenza, Italy

**Medical Scool, Universit? del Sacro Cuore (Rome branch), Largo Agostino Gemelli n.8, 00168 Roma, Italy

Introduction

Abstract: We provide four algorithms to generate single and multivariate k-statistics and single and multivariate polykays. The computational times are very fast compared with the procedures available in the literature. Such speeding up is obtained through a symbolic method arising from the classical umbral calculus. The classical umbral calculus is a light syntax to manage sequences of numbers or polynomials, involving only elementary rules. The keystone of the procedures here introduced is the connection, achieved by a symbolic device, between cumulants of a random variable and a suitable compound Poisson random variable. Such a connection holds also for multivariate random variables.

Application Areas/Subject: Combinatorics & algebraic methods in statistics.

Keyword: umbral calculus, symmetric polynomials, set partitions, multiset, cumulants, k-statistics, polykays.

Remark: k-statistics, polykays and their multivariate generalization are commonly defined in terms of power sums, that are sums of the rth powers of the data points:

Initialization

 >

 >

 (2.1)

k-statistics

The nth k-statistic  is the unique symmetric unbiased estimator of the cumulant of a given statistical distribution.

is defined so that E[] =

 >

Example

 >

 (3.1)

 >

 (3.2)

Example of k-statistics construction ()

 >

 (3.1.1)

 >

 (3.1.2)

 >

 (3.1.3)

 >

 (3.1.4)

 >

 (3.1.5)

Test previouse result

 >

 (3.1.6)

 >

 (3.1.7)

Note on "fd" function

fd( x, y ): x is the lower factorial and y is the numbers of factors to delete from left of lower factorial expression.

Example: the decreasing factorial = n*(n-1)*(n-2)*(n-3)

 >

 (3.2.1)

Deleting "n" from

 >

 (3.2.2)

Deleting "n*(n-1)" from

 >

 (3.2.3)

Remark: if we want to calculate the following expression:

we have to compute:

where ()  is obtained from  deleting the first two terms.

Example: compare the results of the expressions computed with and without "fd" function.

Without "fd" function

 >

 (3.2.4)

 >

 (3.2.5)

 >

 (3.2.6)

With "fd" function. This metod is used in functions generating k-statistics and polykays.

 >

 (3.2.7)

 >

 (3.2.8)

 >

 (3.2.9)

 >

 (3.2.10)

Polykays

The symmetric statistic is defined as

E[]  = ...

where    is a cumulant. These statistics called polykays generalize k-statistics.

 >

 >

 (4.1)

 >

 (4.2)

 >

Example of polykays construction ()

 >

 (4.1.1)

 >

 (4.1.2)

 >

 (4.1.3)

 >

 (4.1.4)

 >

 (4.1.5)

 >

 (4.1.6)

Test previous result

 >

 (4.1.7)

 >

 (4.1.8)

Multiset subdivision

The following algorithm function is used for listing all subdivision of a multiset. This algorithm is fully discussed in [3]

Note: the algorithm is necessary for multivariate case. It is recalled only one time for every parameter input of the multivariate function. This speeds up the procedure.

 >

Multivariate k-statistics

 >

 >

 (6.1)

 >

 (6.2)

 >

 (6.3)

Example of multivariate k-statistics construction ()

 >

 (6.1.1)

 >

 (6.1.2)

 >

 (6.1.3)

 >

 (6.1.4)

 >

 (6.1.5)

Test previous result

 >

 (6.1.6)

 >

 (6.1.7)

Multivariate polykays

 >

 >

 (7.1)

 >

 (7.2)

 >

 (7.3)

Example of multivariate polykays construction ()

M is the max order of elements in { [1,1],[1] }

N is ( 1 + 1 ) + ( 1 )

 >

 (7.1.1)

 >

 (7.1.2)

If args =

 >

 (7.1.3)

 >

 (7.1.4)

 >

 (7.1.5)

 >

 (7.1.6)

 >

 (7.1.7)

 >

 (7.1.8)

 >

 (7.1.9)

 >

 (7.1.10)

Note on function "ricVtab":  for example with parameters [P1P2,P1] the function returns 1/2 [ where 1 is in vTab e 2 in vParts ] and computes fd(3-1, 2) [where 3-1 is N-1 and 2 is order of [P1P2, P1] block.

 >

 (7.1.11)

 >

 (7.1.12)

 >

 (7.1.13)

Test previous result

 >

 (7.1.14)

 >

 (7.1.15)

Master function "polyk" for manage all cases

This function allows us to recall all functions for generate k-statistics, polykays and their multivariate generalizzations
The input is the following:

-  for generate  k-statistics  the parameter is:  [ r ]

-  for generate  polykays the parameter is:  [ r ], [ s ]

-  for generate multivariate k-statistics  the parameter is:  [ r, s ]

-  for generate multivariate polykays  the parameter is:  [ r, s ],  [ u, v]

 >

Example

 >

 (8.1)

 >

 (8.2)

 >

 (8.3)

 >

 (8.4)

Replacing symbols with numerical data

Sums of the rth powers of the data points:

 >

Example

 >

 (9.1)

 >

 (9.2)

This function allows us to process a k-statistic or polykay replacing the simbols with numerical data.

The parameter is the following:

-  for generate  k-statistics  the parameter is:  [ r ],  [ [ n1, n2, ...] ]

-  for generate  polykays the parameter is:  [ [ r ], [ s ] ],  [ [ n1, n2, ...] ]

-  for generate multivariate k-statistics  the parameter is: [ [ r , s ] ],  [ [ n1a, n2a], [ n1b, n2b] , ... ]

-  for generate multivariate polykays  the parameter is:  [ [ r , s ],  [ u , v] ],  [ [ n1a, n2a], [ n1b, n2b] , ... ]

 >

Examples: k-statistics and polykays

 >

 (9.3)

The estimator for the mean is given by

 >

 (9.4)

The estimator for the variance is given by

 >

 (9.5)

The estimator for the skewness is given by   /

 >

 (9.6)

The estimator for the kurtosis is given by   /

 >

 (9.7)

The estimator for the is given by

 >

 (9.8)

Examples: multivariate k-ktatistics and multivariate polykays

 >

 (9.9)

The estimator for the is given by

 >

 (9.10)

The estimator for the is given by

 >

 (9.11)

The estimator for the is given by

 >

 (9.12)

 >

Conclusions

Tables 1 and 2 show computational times of three procedures, implementing algorithms to express single and multivariate k-statistics and single and multivariate polykays. The first one, which we call AS algorithms, has been implemented in Mathematica and refers to procedures explained in [7] - availables on the web page http://www.utstat.toronto.edu/david/trans.7.nb. The second one refers to the package MathStatica [8]. Note that in this package, there are no procedures devoted to multivariate polykays. The third procedure, named Fast algorithms has been implemented in Maple 10.x by using the results explained in [5]. The procedure to compute subdivisions of multisets have been described with a wealth of details in [3] and [4].
Above all, comparing our procedures with the more speed ones of MathStatica, it is evident the improvement in computational times. Let us remark that, for all the considered procedures, the results are in the same output form and have been evaluated on the same platform.

All tasks have been performed on a PC Pentium(R)4 Intel(R), CPU 3.00 Ghz, 480MB Ram.

Tab 1

Comparison of computational times for k-statistics and polykays. Missed computational times "means greater than 20 houres".

Tab 2

Comparison of computational times for multivariate k-statistics and multivariate polykays. For AS Algorithms, missed computational times means "greater than 20 houres". For MathStatica, missed computational times means "procedures not available".

References

[1] Di Nardo E., G. Guarino, D. Senato (2008) A Maple algorithm for polykays and their generalizations. Adv. Appl. Stat. Vol. 8, No. 1, 19 - 36, http://www.pphmj.com/journals/adas.htm.

[2] Di Nardo E., G. Guarino, D. Senato (2008) An unifying framework for k-statistics, polykays and their generalizations. Bernoulli. Vol. 14(2), 440-468. Official Journal of the Bernoulli Society for Mathematical Statistics and Probability, http://isi.cbs.nl/bernoulli/,

[3]  Di Nardo E., G. Guarino, D. Senato, Multiset Subdivision, source Maple algorithm located in www.maplesoft.com (submitted)

[4] Di Nardo E., G. Guarino, D. Senato (2008) Symbolic computation of moments of sampling distributions. Comp. Stat. Data Analysis Vol. 52, no. 11, 4909-4922, (download from http://arxiv.org/PS_cache/arxiv/pdf/0806/0806.0129v1.pdf or http://www.unibas.it/utenti/dinardo/lavori.html)

[5] Di Nardo E., G. Guarino, D. Senato (2007), A new method for fast computing unbiased estimators of cumulants. In press Statistics and Computing. http://www.springer.com/statistics/computational/journal/11222 (download from http://www.unibas.it/utenti/dinardo/lavori.html)

[6] Di Nardo E., G. Guarino, D. Senato, A Maple algorithm for k-statistics, polykays and their multivariate generalization, source Maple algorithm located in www.maplesoft.com (submitted)

[7] D. F. Andrews and J. E. Stafford Symbolic computation for statistical inference :. Oxford Statistical Science Series, 21. Oxford University Press, Oxford, 2000.

[8] C. Rose and M. D. Smith, Mathematical Statistics with Mathematica:, Spinger Verlag, New York, 2002.

Legal Notice: The copyright for this application is owned by the author(s). Neither Maplesoft nor the author are responsible for any errors contained within and are not liable for any damages resulting from the use of this material. This application is intended for non-commercial, non-profit use only. Contact the author for permission if you wish to use this application in for-profit activities

 >