Statistics - Maple Programming Help

# Online Help

###### All Products    Maple    MapleSim

Home : Support : Online Help : Statistics and Data Analysis : Statistics Package : Quantities : Statistics/RousseeuwCrouxSn

Statistics

 RousseeuwCrouxSn
 compute Rousseeuw and Croux' Sn

 Calling Sequence RousseeuwCrouxSn(A, ds_options) RousseeuwCrouxSn(X, rv_options)

Parameters

 A - X - algebraic; random variable or distribution ds_options - (optional) equation(s) of the form option=value where option is one of correction, ignore, or weights; specify options for computing Rousseeuw and Croux' Sn statistic of a data set rv_options - (optional) equation of the form numeric=value; specifies options for computing Rousseeuw and Croux' Sn statistic of a random variable

Description

 • The RousseeuwCrouxSn function computes a robust measure of the dispersion of the specified data set or random variable, as introduced by Rousseeuw and Croux in .
 • This statistic, referred to as ${S}_{n}$ in the remainder of this help page, is defined for a data set ${A}_{1},{A}_{2},\mathrm{...},{A}_{n}$ as:

${S}_{n}=\mathrm{LowMedian}\left(\mathrm{HighMedian}\left(\left|{A}_{i}-{A}_{j}\right|,i=1..n\right),j=1..n\right)$

 where the $\mathrm{LowMedian}$ of $n$ values is its $\mathrm{floor}\left(\frac{1}{2}n+\frac{1}{2}\right)$th OrderStatistic and the $\mathrm{HighMedian}$ is its $\mathrm{ceil}\left(\frac{1}{2}n+\frac{1}{2}\right)$th OrderStatistic. ($\mathrm{HighMedian}$ and $\mathrm{LowMedian}$ are not Maple functions - they are only used here to define ${S}_{n}$.)
 • ${S}_{n}$ is a robust statistic: it has a high breakdown point (the proportion of arbitrarily large observations it can handle before giving an arbitrarily large result). The breakdown point of ${S}_{n}$ is the maximum possible value, $\frac{1}{2}$.
 • ${S}_{n}$ is a measure of dispersion, also called a measure of scale: if $S[n]\left(X\right)=a$, then for all real constants $\mathrm{\alpha }$ and $\mathrm{\beta }$, we have $S[n]\left(\mathrm{\alpha }X+\mathrm{\beta }\right)=\left|\mathrm{\alpha }\right|a$.
 • The first parameter can be a data set, a distribution (see Statistics[Distribution]), a random variable, or an algebraic expression involving random variables (see Statistics[RandomVariable]). For a data set $A$, RousseeuwCrouxSn computes ${S}_{n}$ as defined above. For a distribution or random variable $X$, RousseeuwCrouxSn computes the asymptotic equivalent - the value that ${S}_{n}$ converges to for ever larger samples of $X$.

Computation

 • By default, all computations involving random variables are performed symbolically (see option numeric below).
 • All computations involving data are performed in floating-point; therefore, all data provided must have type/realcons and all returned solutions are floating-point, even if the problem is specified with exact values.
 • For more information about computation in the Statistics package, see the Statistics[Computation] help page.

Data Set Options

 • The ds_options argument can contain one or more of the options shown below. More information for some options is available in the Statistics[DescriptiveStatistics] help page.
 • ignore=truefalse -- This option controls how missing data is handled by the RousseeuwCrouxSn command. Missing items are represented by undefined or Float(undefined). So, if ignore=false and A contains missing data, the RousseeuwCrouxSn command may return undefined. If ignore=true all missing items in A will be ignored. The default value is false.
 • weights=Vector -- Data weights. The number of elements in the weights array must be equal to the number of elements in the original data sample. By default all elements in A are assigned weight $1$.
 • correction=samplesize or correction=none -- In , Rousseeuw and Croux define a correction factor ${c}_{n}$ for finite sample size as:

${c}_{n}=\left\{\begin{array}{cc}0.743& n=2\\ 1.851& n=3\\ 0.954& n=4\\ 1.351& n=5\\ 0.993& n=6\\ 1.198& n=7\\ 1.005& n=8\\ 1.131& n=9\\ \frac{n}{n-0.9}& n>9\mathbf{and}n\colon\colon \mathrm{odd}\\ 1& n>9\mathbf{and}n\colon\colon \mathrm{even}\end{array}$

 If the option correction = samplesize is given, then this correction factor is applied before the result is returned. The default is correction = none, that is, no correction factor is applied.

Random Variable Options

 The rv_options argument can contain one or more of the options shown below. More information for some options is available in the Statistics[RandomVariables] help page.
 • numeric=truefalse -- By default, ${S}_{n}$ is computed using exact arithmetic. To compute ${S}_{n}$ numerically, specify the numeric or numeric = true option.

Examples

 > $\mathrm{with}\left(\mathrm{Statistics}\right):$

Compute ${S}_{n}$ for a data sample.

 > $s≔⟨1,5,2,2,7,4,1,6⟩$
 ${s}{≔}\left[\begin{array}{r}{1}\\ {5}\\ {2}\\ {2}\\ {7}\\ {4}\\ {1}\\ {6}\end{array}\right]$ (1)
 > $\mathrm{RousseeuwCrouxSn}\left(s\right)$
 ${3.}$ (2)

Employ Rousseeuw and Croux's finite sample size correction.

 > $\mathrm{RousseeuwCrouxSn}\left(s,'\mathrm{correction}=\mathrm{samplesize}'\right)$
 ${3.01500000000000}$ (3)

Let's replace three of the values with very large values.

 > $t≔\mathrm{copy}\left(s\right):$
 > ${t}_{1..3}≔{10}^{100}:$
 > $t$
 $\left[\begin{array}{r}{10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000}\\ {10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000}\\ {10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000}\\ {2}\\ {7}\\ {4}\\ {1}\\ {6}\end{array}\right]$ (4)
 > $\mathrm{RousseeuwCrouxSn}\left(t\right)$
 ${6.}$ (5)

The value of ${S}_{n}$ stays bounded, because it has a high breakdown point.

Compute ${S}_{n}$ for a normal distribution.

 > $\mathrm{RousseeuwCrouxSn}\left('\mathrm{Normal}'\left(3,5\right),'\mathrm{numeric}'\right)$
 ${4.192525630}$ (6)

The symbolic result is a rather complicated expression. It evaluates to the same floating point number.

 > $\mathrm{RousseeuwCrouxSn}\left('\mathrm{Normal}'\left(3,5\right)\right)$
 ${5}{}{\mathrm{RootOf}}{}\left({\mathrm{erf}}{}\left(\frac{{1}}{{2}}{}\sqrt{{2}}{}{\mathrm{_Z}}{+}{\mathrm{RootOf}}{}\left({2}{}{\mathrm{erf}}{}\left({\mathrm{_Z}}\right){-}{1}\right)\right){+}{\mathrm{erf}}{}\left(\frac{{1}}{{2}}{}\sqrt{{2}}{}{\mathrm{_Z}}{-}{\mathrm{RootOf}}{}\left({2}{}{\mathrm{erf}}{}\left({\mathrm{_Z}}\right){-}{1}\right)\right){-}{1}\right)$ (7)
 > $\mathrm{evalf}\left(\right)$
 ${4.192525630}$ (8)

Generate a random sample of size 1000000 from the same distribution and compute the sample's ${S}_{n}$.

 > $A≔\mathrm{Sample}\left('\mathrm{Normal}'\left(3,5\right),1000000\right):$
 > $\mathrm{RousseeuwCrouxSn}\left(A\right)$
 ${4.19118343568100}$ (9)

Consider the following Matrix data set.

 > $M≔\mathrm{Matrix}\left(\left[\left[3,1130,114694\right],\left[4,1527,127368\right],\left[3,907,88464\right],\left[2,878,96484\right],\left[4,995,128007\right]\right]\right)$
 ${M}{≔}\left[\begin{array}{rrr}{3}& {1130}& {114694}\\ {4}& {1527}& {127368}\\ {3}& {907}& {88464}\\ {2}& {878}& {96484}\\ {4}& {995}& {128007}\end{array}\right]$ (10)

We compute ${S}_{n}$ for each of the columns.

 > $\mathrm{RousseeuwCrouxSn}\left(M\right)$
 $\left[\begin{array}{ccc}{1.}& {117.}& {13313.}\end{array}\right]$ (11)

References

  Stuart, Alan, and Ord, Keith. Kendall's Advanced Theory of Statistics. 6th ed. London: Edward Arnold, 1998. Vol. 1: Distribution Theory.
  Rousseeuw, Peter J., and Croux, Christophe. Alternatives to the Median Absolute Deviation. Journal of the American Statistical Association 88(424), 1993, pp.1273-1283.

Compatibility

 • The Statistics[RousseeuwCrouxSn] command was introduced in Maple 17.
 • For more information on Maple 17 changes, see Updates in Maple 17.