Statistics - Maple Programming Help

Online Help

All Products    Maple    MapleSim


Home : Support : Online Help : Statistics and Data Analysis : Statistics Package : Quantities : Statistics/RousseeuwCrouxSn

Statistics

  

RousseeuwCrouxSn

  

compute Rousseeuw and Croux' Sn

 

Calling Sequence

Parameters

Description

Computation

Data Set Options

Random Variable Options

Examples

References

Compatibility

Calling Sequence

RousseeuwCrouxSn(A, ds_options)

RousseeuwCrouxSn(X, rv_options)

Parameters

A

-

data set or Matrix data set

X

-

algebraic; random variable or distribution

ds_options

-

(optional) equation(s) of the form option=value where option is one of correction, ignore, or weights; specify options for computing Rousseeuw and Croux' Sn statistic of a data set

rv_options

-

(optional) equation of the form numeric=value; specifies options for computing Rousseeuw and Croux' Sn statistic of a random variable

Description

• 

The RousseeuwCrouxSn function computes a robust measure of the dispersion of the specified data set or random variable, as introduced by Rousseeuw and Croux in [2].

• 

This statistic, referred to as Sn in the remainder of this help page, is defined for a data set A1,A2,...,An as:

Sn=LowMedianHighMedianAiAj,i=1..n,j=1..n

  

where the LowMedian of n values is its floor12n+12th OrderStatistic and the HighMedian is its ceil12n+12th OrderStatistic. (HighMedian and LowMedian are not Maple functions - they are only used here to define Sn.)

• 

Sn is a robust statistic: it has a high breakdown point (the proportion of arbitrarily large observations it can handle before giving an arbitrarily large result). The breakdown point of Sn is the maximum possible value, 12.

• 

Sn is a measure of dispersion, also called a measure of scale: if S[n]X=a, then for all real constants α and β, we have S[n]αX+β=αa.

• 

The first parameter can be a data set, a distribution (see Statistics[Distribution]), a random variable, or an algebraic expression involving random variables (see Statistics[RandomVariable]). For a data set A, RousseeuwCrouxSn computes Sn as defined above. For a distribution or random variable X, RousseeuwCrouxSn computes the asymptotic equivalent - the value that Sn converges to for ever larger samples of X.

Computation

• 

By default, all computations involving random variables are performed symbolically (see option numeric below).

• 

All computations involving data are performed in floating-point; therefore, all data provided must have type/realcons and all returned solutions are floating-point, even if the problem is specified with exact values.

• 

For more information about computation in the Statistics package, see the Statistics[Computation] help page.

Data Set Options

• 

The ds_options argument can contain one or more of the options shown below. More information for some options is available in the Statistics[DescriptiveStatistics] help page.

• 

ignore=truefalse -- This option controls how missing data is handled by the RousseeuwCrouxSn command. Missing items are represented by undefined or Float(undefined). So, if ignore=false and A contains missing data, the RousseeuwCrouxSn command may return undefined. If ignore=true all missing items in A will be ignored. The default value is false.

• 

weights=Vector -- Data weights. The number of elements in the weights array must be equal to the number of elements in the original data sample. By default all elements in A are assigned weight 1.

• 

correction=samplesize or correction=none -- In [2], Rousseeuw and Croux define a correction factor cn for finite sample size as:

cn={0.743n=21.851n=30.954n=41.351n=50.993n=61.198n=71.005n=81.131n=9nn0.9n>9andnodd1n>9andneven

  

If the option correction = samplesize is given, then this correction factor is applied before the result is returned. The default is correction = none, that is, no correction factor is applied.

Random Variable Options

  

The rv_options argument can contain one or more of the options shown below. More information for some options is available in the Statistics[RandomVariables] help page.

• 

numeric=truefalse -- By default, Sn is computed using exact arithmetic. To compute Sn numerically, specify the numeric or numeric = true option.

Examples

withStatistics:

Compute Sn for a data sample.

s1,5,2,2,7,4,1,6

s15227416

(1)

RousseeuwCrouxSns

3.

(2)

Employ Rousseeuw and Croux's finite sample size correction.

RousseeuwCrouxSns,'correction=samplesize'

3.01500000000000

(3)

Let's replace three of the values with very large values.

tcopys:

t1..310100:

t

10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000027416

(4)

RousseeuwCrouxSnt

6.

(5)

The value of Sn stays bounded, because it has a high breakdown point.

Compute Sn for a normal distribution.

RousseeuwCrouxSn'Normal'3,5,'numeric'

4.192525630

(6)

The symbolic result is a rather complicated expression. It evaluates to the same floating point number.

RousseeuwCrouxSn'Normal'3,5

5RootOferf122_Z+RootOf2erf_Z1+erf122_ZRootOf2erf_Z11

(7)

evalf

4.192525630

(8)

Generate a random sample of size 1000000 from the same distribution and compute the sample's Sn.

ASample'Normal'3,5,1000000:

RousseeuwCrouxSnA

4.19118343568100

(9)

Consider the following Matrix data set.

MMatrix3,1130,114694,4,1527,127368,3,907,88464,2,878,96484,4,995,128007

M31130114694415271273683907884642878964844995128007

(10)

We compute Sn for each of the columns.

RousseeuwCrouxSnM

1.117.13313.

(11)

References

  

[1] Stuart, Alan, and Ord, Keith. Kendall's Advanced Theory of Statistics. 6th ed. London: Edward Arnold, 1998. Vol. 1: Distribution Theory.

  

[2] Rousseeuw, Peter J., and Croux, Christophe. Alternatives to the Median Absolute Deviation. Journal of the American Statistical Association 88(424), 1993, pp.1273-1283.

Compatibility

• 

The Statistics[RousseeuwCrouxSn] command was introduced in Maple 17.

• 

For more information on Maple 17 changes, see Updates in Maple 17.

See Also

Statistics

Statistics[Computation]

Statistics[DescriptiveStatistics]

Statistics[Distributions]

Statistics[Median]

Statistics[MedianDeviation]

Statistics[RandomVariables]