Joint Cumulants of polykays
E. Di Nardo* elvira.dinardo@unito.it https://www.elviradinardo.it Tel: +390116702862 Fax: +390116702878
G. Guarino** giuseppe.guarino@rete.basilicata.it
* Mathematics Department “G. Peano”, University of Turin, Turin, Italy
**Local Health Authority of Potenza, Italy
Introduction
Abstract: Given a multivariate random sample, this set of functions returns the joint cumulants of multivariate polykays in terms of joint cumulants of the multivariate population underlying the sample.
Multivariate polykays are unbiased estimators with minimum variance of joint cumulant products. These estimators are usually indexed by a list of integer vectors: each vector corresponds to a joint cumulant. When this list reduces to a list of single integers, the multivariate polykay reduces to a univariate polykay. When this list reduces to a single integer vector, the multivariate polykay reduces to a multivariate k-statistic. When this list contains a single integer vector with only one element, the multivariate k-statistic reduces to a univariate k-statistic.
Application Areas/Subject: Combinatorics & algebraic methods in statistics.
Keyword: umbral calculus, symmetric polynomial, set partition, multiset, cumulant, k-statistic, polykay.
See Also: Maple algorithm [1] and [2]
Remark: k-statistics, polykays and their multivariate generalization are commonly given in terms of power sums
with n the sample size.
Initialization
Background Functions
The polyk function handles four different algorithms. Depending on the input parameters, univariate or multivariate polykays as well as univariate or multivariate k-statistics are expressed in terms of power sums. Specifically, if the input is a list of integer vectors, univariate polykays are returned; if the input is a list of integer vectors, each containing only one integer, polykays are returned; if the input is a single vector of integers, multivariate k-statistics are returned; if the input is a single integer, univariate k-statistics are returned; if the input is a list with a one dimensional integer vector, univariate k-statistics are returned. All these functions are in the MAPLE worksheet [1] and have been included here for user convenience. For the multivariate case, the algorithm listing all subdivisions of a multiset has also been included. This algorithm is given and discussed in the MAPLE worksheet [2]. For details on the procedures and mathematical backgroud see [3 - 6].
K-Statistics
The n-th k-statistic is the unique symmetric unbiased estimator of the n-th cumulant of a given statistical distribution, that is
E[] =
where E denotes the expectation. They are expressed in terms of power sums involving the random variables of a random sample.
Exampe: The k-statistic of order 3 is
Polykays
The symmetric statistic is an unbiased estimator of the cumulant product ..., that is
E[] = ...
where is the r-th cumulant. These statistics are called polykays or generalized k-statistics.
Multiset subdivision
The makeTab function lists all subdivisions of a multiset (for a thorough discussion see [2]). With multivariate k-statistics and/or multivariate polykays, this function speeds up the overall procedure (see [7]). For details on the procedure and mathematical background see also [3] and [4].
Multivariate k-statistics
The multivariate k-statistic is the unique symmetric unbiased estimator of the n-th joint cumulant of a given multivariate statistical distribution, that is
where E denotes the expectation. They are expressed in terms of multivariate power sums involving the random vectors of a random sample.
is
Note that the power sums in (3.4.1) are:
where () refers to the population and () refers to the population
Multivariate Polykays
The multivariate polykay is an unbiased estimator of the cumulant product ..., that is
where E denotes the expectation and are joint cumulants of order (r,s,...) and (i,j,..) respectively.
Note that the vectors in the input list must be the same length: if any vector is "smaller," the function adds 0 until it reaches the correct length.
Master function "polyk" handling all cases
This function allows us to recall all functions for generate k-statistics, polykays and their multivariate generalizzations. The input is the following:
- for generate k-statistics the parameter is: [ r ]
- for generate polykays the parameter is: [ r ], [ s ]
- for generate multivariate k-statistics the parameter is: [ r, s ]
- for generate multivariate polykays the parameter is: [ r, s ], [ u, v]
Esample: The k-statistic of order 3 is
Replacing symbols with numerical data
The following functions allow the random variables of a random sample to be replaced with corresponding numerical observations.
The powS function returns the sum of the r-th powers of the observations.
Example
The function computes a numerical value of a k-statistic or a polykay (including the multivariate cases), replacing the random variables of the random sample with the corresponding numerical observations. The parameters are the following:
- for generate k-statistics the parameters are: [ r ], [ [ n1, n2, ...] ]
- for generate polykays the parameters are: [ [ r ], [ s ] ], [ [ n1, n2, ...] ]
- for generate multivariate k-statistics the parameters are: [ [ r , s ] ], [ [ n1a, n2a], [ n1b, n2b] , ... ]
- for generate multivariate polykays the parameters are: [ [ r , s ], [ u , v] ], [ [ n1a, n2a], [ n1b, n2b] , ... ]
Examples: suppose to have the following (univariate) numerical sample
An estimation of the mean (k-statistic of first order) is
An estimation of the variance (k-statistic of second order) is
An estimation of the skewness is /
An estimation of the kurtosis is /
The estimation of the product is
Examples: suppose to have the following (bivariate) numerical sample
An estimation of the joint cumulant is
An estimation of the product of the joint cumulant is
Joint Cumulant of polykays: auxiliary functions
The expression produced by the following set of functions corresponds to formula (4.10) pp. 96 in [8]. This formula expresses the cumulants of multivariate polykays in terms of joint cumulants of the population underlying the multivariate sample. This choice is motivated by the reduction of the overall final result complexity. Referring to the simplest case of k-statistics, the procedure is as follows: 1) the cumulants of the k-statistics are expressed in terms of their moments using the function ; 2) the moments of the k-statistics are expressed in terms of the moments of the population underlying the sample using the functions pSS and E; 3) the moments of the population underlying the sample are expressed in terms of its cumulants using the function .
The same procedure works for multivariate polykays.
In the following details are given for each step.
Cumulants and Moments
The ctr function (see [1]) expresses cumulants of a random vector in terms of joint moments . Details on the procedure and mathematical background are given in [1] and [3] Example: The joint cumulant in terms of joint moments is
The rct function expresses joint moments of a random vector in terms of joint cumulants . Details on the procedure and mathematical background are given in [3].
Example: The joint moment in terms of joint cumulants is
Product of Augmented Symmetric Functions
Products of multivariate polykays return products of power sums. For example the expectation of is: [ ] = where and in terms of moments, a fundamental result of estimation might be used if these products are transformed in augmented symmetric functions, see [9]. For univariate sample, an augmented symmetric function involves products of variables with different indexes, that is ... The pSS function carries out this task.
Example: product of power sums in terms of augmented symmetric functions:
Example: product of power sums in terms of multivariate augmented symmetric functions:
The following is a test explaing how to transform the power sum product with in augmented symmetric functions with n=3:
Expected values of Augmented Simmetric Functions
The fundamental result of estimation we refer to allows to express expectation of augmented symmetric functions in terms of moments [10], that is = n(n-1)...(n-s+1) where s denotes the number of elements in the list The product n(n-1)...(n-s+1) is the lower factorial . The df function computes the lower factorial. This result can be suitably generalized to the multivariate case.
Example: the decreasing factorial
The E function computes the expectation of augmented symmetric functions.
Example: the expectation of is
The fE function repeatedly executes the E function on an expression.
Example: the expectation of
Joint Cumulant of polykays: the main function
The jcks function returns the joint cumulants of multivariate polykays according to the procedure described in the previous section.
Examples
is the first cumulant of
is the second cumulant of
is the third cumulant of
In the following the variances of the first few k-statistics are computed, see [11]. Note that
More examples are given in [10], pag. 265. The joint cumulant of order (2, 1) of () is
The joint cumulant of order (2, 2) of
The joint cumulant of order (1, 1) of
The joint cumulant of order (1,2) of
The joint cumulant of order (1,1) of (see[8] pag.94)
The joint cumulant of order (1,1,1) of (see[8] pag.94)
Estimation of joint cumulants of polykays
Given a random numerical sample, the njcks function allows estimating the joint cumulants of multivariate polykays. The function invokes the nployk function in the background section. The first two parameters are the same as in the njcks function, while the data vector must be entered as third parameter. The procedure consists in expressing the joint cumulant of a multivariate polykay in terms of joint cumulants of the underlying population and then replacing the occurrences of these joint cumulants and/or their products with the corresponding multivariate numerical polykays.
Example: suppose to have the following (univariate) numerical sample
The estimation of the joint cumulant of order (2,2) of
Example: suppose to have the following (bivariate) numerical sample
An estimation of the joint cumulant of order (1,1) of is
Conclusions
The jcks function computes the cumulants of multivariate polykays. These routines are particularly useful for assessing the goodness of the estimate obtained through these estimators. For example, the variance of first- and fourth-order k-statistics provides insights into the goodness-of-fit estimation of the mean, variance, skeweness, and kurtosis, respectively. Further examples are provided in the last section of this worksheet.
References
[1] (2009) Di Nardo E., Guarino G., Senato D. Fast algorithms for k-statistics, polykays and their multivariate generalizations. (Worksheet Maple Software: available at https://www.maplesoft.com/Applications/Detail.aspx?id=33041)
[2] (2009) Di Nardo E., Guarino G., Senato D. Multiset subdivisions. (Worksheet Maple Software: : available at https://www.maplesoft.com/Applications/Detail.aspx?id=33039).
[3] (2015) Di Nardo E. Symbolic calculus in mathematical statistics: a review. Seminaire Lotharingien de Combinatoire Vol. 67 (B67a), pp. 72
[4] (2009) Di Nardo E., Guarino G., Senato D. A new method for fast computing unbiased estimators of cumulants. Statistics and Computing 19, 155--165.
[5] (2008) Di Nardo E., Guarino G., Senato D. A unifying framework for k-statistics, polykays and their multivariate generalizations. Bernoulli 14, 440--468.
[6] (2006) Di Nardo E., Senato D. A symbolic method for k-statistics. Applied Mathematics Letters 19, 968--975.
[7] (2011) Di Nardo E., Guarino G., Senato D. A new algorithm for computing the multivariate Faa di Bruno's formula. Applied Mathematics and Computation 217, 6286--6295
[8] P. McCullagh, Tensor Methods in Statistics, Monographs on Statistics and Applied Probability, Taylor & Francis Group, 22 December 2017
[9] (2008) Di Nardo E., Guarino G., Senato D. Symbolic computation of moments of sampling distributions. Computational Statistics and Data Analysis 52, 4909--4922.
[10] Stuart, A., Ord, J.K., 1987. Kendall’s Advanced Theory of Statistics 1. Charles Griffin and Company Limited, London.
[11] Weisstein, Eric W. "k-Statistic." From MathWorld--A Wolfram Web Resource. https://mathworld.wolfram.com/k-Statistic.html
Legal Notice: The copyright for this application is owned by the author(s). Neither Maplesoft nor the author are responsible for any errors contained within and are not liable for any damages resulting from the use of this material. This application is intended for non-commercial, non-profit use only. Contact the author for permission if you wish to use this application in for-profit activities