 Statistics - Maple Programming Help

Home : Support : Online Help : Statistics and Data Analysis : Statistics Package : Simulation : Statistics/Sample

Statistics

 Sample
 generate random sample

 Calling Sequence Sample(X, n, opts) Sample(X, m, opts) Sample(X, rng, opts) Sample(X, out, opts) Sample(X, opts)

Parameters

 X - algebraic; random variable or distribution n - nonnegative integer; sample size m - list of two nonnegative integers; Matrix dimensions rng - integer range or list of integer ranges; Array dimensions out - float rtable; to be filled with data options - (optional) equations of the form option = value, where option is method, possibly indexed by a name; specify options for sample generation

Options

 • method = name or method = list -- This option can be used to select a method of generating the sample. There are four main choices: method = default, method = custom, method = discrete, and method = envelope. One can supply method-specific options by instead specifying a list, the first element of which is one of the names default, custom, discrete, and envelope, and the other elements are equations; for example, method = [envelope, updates=20, range=0..100]. These method-specific options will be explained below.
 – method = envelope uses an implementation of acceptance/rejection generation with an adaptive piecewise linear envelope, applicable to continuous distributions. This implementation will only work for distributions where on its support, the PDF is twice differentiable, has a continuous first derivative, and has only finitely many inflection points. There are three valid method-specific options: range, basepoints, and updates.
 range: The (finite) range over which the piecewise linear envelope is to be defined, and consequently where the samples are to be found. If range = deduce (the default), then Maple takes the range given by the $\mathrm{\epsilon }$ and $1-\mathrm{\epsilon }$ Quantiles of the distribution, for some small positive value of $\mathrm{\epsilon }$ that depends on the value of Digits. Otherwise, range should be a range of two real numbers, such as range = 0 .. 1.
 basepoints: The base points are the boundaries between the segments of the piecewise linear envelope, which should include all inflection points of the PDF of the distribution. If basepoints = deduce (the default), then Maple attempts to find all inflection points itself. Otherwise, basepoints should be a list of floating point real numbers which includes all inflection points.
 updates: The envelope is automatically refined as more numbers are generated; the maximal number of segments is given by this option, which should be a positive integer. The default value is $100$.
 – method = discrete uses an implementation of the alias method by Walker (see references below), applicable to discrete distributions. Because this method computes and stores the individual probabilities for all possible outcomes within the range (see below), it may be inefficient for distributions with very heavy tails. There is one method-specific option: range.
 range: The (finite) range of integers for which the probabilities are computed. If the distribution uses the DiscreteValueMap feature (this is the case if the distribution can attain non-integer values), then this describes the range of source values; the map is applied to these integers to obtain the resulting values.
 – method = custom uses a distribution-specific method. Almost all predefined distributions have a highly efficient custom implementation in external C code. Method-specific options are all ignored.
 – method = default (which is the default) selects one of the other three methods. For most built-in distributions, it selects method = custom. For other distributions, such as custom-defined ones, the system falls back to either using method = envelope (for continuous distributions) or using method = discrete (for discrete distributions). The method-specific options accepted are the same as for the applicable fallback method, and they are only used in case the system falls back to that generator.
 • If X is an algebraic expression involving multiple random variables, say ${R}_{1}$ and ${R}_{2}$, then one can specify different sample generation methods for ${R}_{1}$ and ${R}_{2}$ by using options ${\mathrm{method}}_{{R}_{1}}={\mathrm{generator}}_{1}$ and ${\mathrm{method}}_{{R}_{2}}={\mathrm{generator}}_{2}$, where ${\mathrm{generator}}_{1}$ and ${\mathrm{generator}}_{2}$ are sample generation methods that could be validly specified as $\mathrm{method}={\mathrm{generator}}_{i}$. If a random variable-specific sample generation method is given only for some of the random variables, the others will use the method given by the $\mathrm{method}=\mathrm{...}$ option, or default if no such option is present.

Efficiency

 • When implementing an algorithm that uses a large number of random samples, it can be worthwhile to think about efficiency of the random sample generation. In most cases, the best efficiency is achieved when all samples are generated at once in a preprocessing phase and stored in a Vector (using the first calling sequence, above), and the values are then used one by one in the algorithm.
 In some cases, however, this is not possible. For example, this might take too much memory (if a very large number of samples is needed), it might be difficult or impossible to predict the number of samples needed, or the parameters of the random variable might change during the algorithm. In the first two cases, the recommended strategy is to use the fourth calling sequence to create a procedure p, then use p to create a Vector that can hold a large number of samples (using, say, $v≔p\left({10}^{5}\right)$), using the elements of v one by one, and calling $p\left(v\right)$ to refill v when the samples run out. If the parameters of the random variable keep changing, then one can define the random variable with parameters that are unassigned initially, use the fourth calling sequence to create a procedure p, then assign values to the parameters afterwards. An example is provided below.
 • For some of the discrete distributions, the method selected by default is not method = custom but method = discrete. For these distributions, this method is faster when generating more than about 1000 random numbers. If you need to generate fewer random numbers, you can select method = custom by including that option explicitly.

Description

 • The Sample command generates a random sample drawn from the distribution given by X.
 • The first parameter, X, can be a distribution (see Statistics[Distribution]), a random variable, or an algebraic expression involving random variables (see Statistics[RandomVariable]).
 • In the first calling sequence, the second parameter, n, is the sample size. This calling sequence will return a newly created Vector of length n, filled with the sample values. This calling sequence, or one of the next two, is recommended for all cases where there are no great performance concerns.
 • In the second calling sequence, the second parameter, m, is a list of two nonnegative integers. This calling sequence will return a newly created Matrix with the specified dimensions, filled with the sample values.
 • In the third calling sequence, the second parameter, rng, is a range or a list of ranges determining the dimensions of an Array. This Array will be created, filled with the sample values, and returned.
 • In the fourth calling sequence, the second parameter, out, is an rtable (such as a Vector) that was created beforehand. Upon successful return of the Sample command, out will have been filled with the sample values.
 out needs to have rectangular storage and the float data type that is consistent with the current settings of Digits and UseHardwareFloats. That is, if either UseHardwareFloats = true, or UseHardwareFloats = deduced and Digits <= evalhf(Digits) (which is the default), then out needs to have datatype = float; in the other case, that is, if either UseHardwareFloats = false, or UseHardwareFloats = deduced and Digits > evalhf(Digits), then out needs to have datatype = sfloat. This can easily be achieved by supplying the option datatype = float to the rtable creation function; this will automatically select the correct data type for the current settings.
 • In the fourth calling sequence, Sample returns a procedure p, which can subsequently be called to generate samples of X repeatedly. The procedure p accepts a single argument, which can be n, m, rng, or out, and then behaves as if one of the first three calling sequences were called. p does not accept options; any options should be given in the call to Sample itself.

Examples

 > $\mathrm{with}\left(\mathrm{Statistics}\right):$

Straightforward sampling of a distribution.

 > $X≔\mathrm{RandomVariable}\left(\mathrm{Normal}\left(0,1\right)\right)$
 ${X}{≔}{\mathrm{_R}}$ (1)
 > $A≔\mathrm{Sample}\left(X,{10}^{6}\right):$
 > $P≔\mathrm{DensityPlot}\left(X,\mathrm{range}=-2..2,\mathrm{thickness}=3,\mathrm{color}=\mathrm{red}\right):$
 > $Q≔\mathrm{Histogram}\left(A,\mathrm{range}=-2..2\right):$
 > $\mathrm{plots}\left[\mathrm{display}\right]\left(P,Q\right)$ We can also sample an expression involving two random variables.

 > $Y≔\mathrm{RandomVariable}\left(\mathrm{Normal}\left(0,1\right)\right)$
 ${Y}{≔}{\mathrm{_R0}}$ (2)
 > $B≔\mathrm{Sample}\left(\frac{X}{Y},{10}^{6}\right):$
 > $P≔\mathrm{DensityPlot}\left(\mathrm{Cauchy}\left(0,1\right),\mathrm{range}=-2..2,\mathrm{thickness}=3,\mathrm{color}=\mathrm{red}\right):$
 > $Q≔\mathrm{Histogram}\left(B,\mathrm{range}=-2..2\right):$
 > $\mathrm{plots}\left[\mathrm{display}\right]\left(P,Q\right)$ Sampling of a custom-defined distribution.

 > $\mathrm{dist}≔\mathrm{Distribution}\left(\mathrm{=}\left(\mathrm{PDF},t↦\mathrm{piecewise}\left(t<0,0,t<1,\frac{36\cdot {t}^{2}\cdot \left(\frac{4}{3}-t\right)}{7},0\right)\right)\right)$
 ${\mathrm{dist}}{:=}{\mathbf{module}}\left({}\right)\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{option}}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathrm{Distribution}}{,}{\mathrm{Continuous}}{;}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{export}}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathrm{Conditions}}{,}{\mathrm{PDF}}{;}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{end module}}$ (3)
 > $B≔\mathrm{Sample}\left(\mathrm{RandomVariable}\left(\mathrm{dist}\right),{10}^{6}\right):$
 > $P≔\mathrm{DensityPlot}\left(\mathrm{dist},\mathrm{range}=0..1,\mathrm{thickness}=3,\mathrm{color}=\mathrm{red}\right):$
 > $Q≔\mathrm{Histogram}\left(B\right):$
 > $\mathrm{plots}\left[\mathrm{display}\right]\left(P,Q\right)$ If we supply a list of ranges instead of a number, we get an Array. With a list of two numbers, we get a Matrix.

 > $a≔\mathrm{Sample}\left(\frac{X}{Y},\left[1..1000,1..1000\right]\right)$
  (4)
 > $m≔\mathrm{Sample}\left(\frac{X}{Y},\left[3,3\right]\right)$
 ${m}{≔}\left[\begin{array}{ccc}{1.73381095915784}& {0.619445146766988}& {-4.38544454374716}\\ {-3.65149550630508}& {-0.940625211071576}& {1.15063333773690}\\ {-6.27347263502003}& {2.04859402579226}& {-0.234438105073932}\end{array}\right]$ (5)

We can use envelope rejection sampling to restrict $X$ and $Y$ to a certain range.

 > $\mathrm{s1}≔\mathrm{Sample}\left(\frac{X}{Y},{10}^{6},\mathrm{method}=\left[\mathrm{envelope},\mathrm{range}=0..1\right]\right)$
  (6)
 > $\mathrm{Histogram}\left(\mathrm{s1},\mathrm{range}=0..10\right)$ Or to restrict only $X$ to a certain range.

 > $\mathrm{s2}≔\mathrm{Sample}\left(\frac{X}{Y},{10}^{6},\mathrm{method}\left[X\right]=\left[\mathrm{envelope},\mathrm{range}=0..1\right]\right)$
  (7)
 > $\mathrm{Histogram}\left(\mathrm{s2},\mathrm{range}=-10..10\right)$ We can refill $\mathrm{s2}$ with different samples as follows.

 > $\mathrm{Sample}\left(\mathrm{Cauchy}\left(0,1\right),\mathrm{s2}\right)$
  (8)
 > $\mathrm{Histogram}\left(\mathrm{s2},\mathrm{range}=-10..10\right)$ Another option is to use a procedure.

 > $p≔\mathrm{Sample}\left(X\right)$
 ${p}{:=}{\mathbf{proc}}\left({n}{::}{\mathrm{Sample:-sizeType}}\right)\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{...}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{end proc}}$ (9)
 > $p\left(\mathrm{s2}\right)$
  (10)
 > $\mathrm{Histogram}\left(\mathrm{s2}\right)$ Sampling of a custom-defined discrete distribution with non-integer values. This distribution attains the value ${\left(\frac{3}{2}\right)}^{n}$ with probability ${2}^{-n}$ for positive $n$.

 > $\mathrm{dist2}≔\mathrm{Distribution}\left(\mathrm{Type}=\mathrm{discrete},\mathrm{=}\left(\mathrm{ProbabilityFunction},t↦{2}^{-\mathrm{log}\left[\frac{3}{2}\right]\left(t\right)}\right),\mathrm{=}\left(\mathrm{DiscreteValueMap},n↦{\left(\frac{3}{2}\right)}^{n}\right),\mathrm{Support}=1..\mathrm{\infty }\right)$
 ${\mathrm{dist2}}{:=}{\mathbf{module}}\left({}\right)\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{option}}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathrm{Distribution}}{,}{\mathrm{Discrete}}{;}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{export}}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathrm{Conditions}}{,}{\mathrm{DiscreteValueMap}}{,}{\mathrm{Support}}{,}{\mathrm{ProbabilityFunction}}{;}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{end module}}$ (11)
 > $s≔\mathrm{Sample}\left(\mathrm{dist2},{10}^{5}\right)$
  (12)
 > $\mathrm{sort}\left(\mathrm{Tally}\left(s\right)\right)$
 $\left[{1.50000000000000}{=}{49902}{,}{2.25000000000000}{=}{25074}{,}{3.37500000000000}{=}{12481}{,}{5.06250000000000}{=}{6226}{,}{7.59375000000000}{=}{3129}{,}{11.3906250000000}{=}{1618}{,}{17.0859375000000}{=}{772}{,}{25.6289062500000}{=}{378}{,}{38.4433593750000}{=}{226}{,}{57.6650390625000}{=}{89}{,}{86.4975585937500}{=}{47}{,}{129.746337890625}{=}{26}{,}{194.619506835938}{=}{17}{,}{291.929260253906}{=}{8}{,}{437.893890380859}{=}{4}{,}{985.261253356934}{=}{1}{,}{7481.82764267921}{=}{2}\right]$ (13)

Finally, here is a somewhat longer example, where we want to generate exponentially distributed numbers; the rate parameter $\mathrm{\lambda }$ starts as being $1$, but for each subsequent value it is the square root of the previous sample value. In order to be able to use a procedure (important for efficiency), we need to make sure that $\mathrm{\lambda }$ is not defined when we create the procedure, otherwise it will only generate samples for the value that $\mathrm{\lambda }$ had at the time of definition. (If $\mathrm{\lambda }$ has a value, it can be undefined by executing lambda := 'lambda';, but since we have not used $\mathrm{\lambda }$ yet, that should not be necessary in this case.)

 > $X≔\mathrm{RandomVariable}\left(\mathrm{Exponential}\left(\mathrm{\lambda }\right)\right)$
 ${X}{≔}{\mathrm{_R6}}$ (14)
 > $p≔\mathrm{Sample}\left(X\right)$
 ${p}{:=}{\mathbf{proc}}\left({n}{::}{\mathrm{Sample:-sizeType}}\right)\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{...}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{end proc}}$ (15)

If we now compute a sample of $X$, then Maple will complain, because $\mathrm{\lambda }$ is unassigned:

 > $p\left(1\right)$

Instead, we assign $1$ to $\mathrm{\lambda }$ and start an iteration.

 > $\mathrm{\lambda }≔1$
 ${\mathrm{\lambda }}{≔}{1}$ (16)
 > $N≔{10}^{3}$
 ${N}{≔}{1000}$ (17)
 > $v≔\mathrm{Vector}\left(N,'\mathrm{datatype}'='\mathrm{float}'\right)$
  (18)
 > $\mathbf{for}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}i\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathbf{to}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}N\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathbf{do}\phantom{\rule[-0.0ex]{0.0em}{0.0ex}}\phantom{\rule[-0.0ex]{2.0em}{0.0ex}}v\left[i\right]≔p\left(1\right)\left[1\right];\phantom{\rule[-0.0ex]{0.0em}{0.0ex}}\phantom{\rule[-0.0ex]{2.0em}{0.0ex}}\mathrm{\lambda }≔\mathrm{sqrt}\left(v\left[i\right]\right)\phantom{\rule[-0.0ex]{0.0em}{0.0ex}}\mathbf{end}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathbf{do}:$

We now create a point plot where pairs of subsequent samples are the horizontal and vertical coordinates.

 > $\mathrm{plots}\left[\mathrm{pointplot}\right]\left(⟨v\left[..-2\right]|v\left[2..\right]⟩,\mathrm{axis}=\left[\mathrm{mode}=\mathrm{log}\right],\mathrm{symbolsize}=1,\mathrm{color}=\mathrm{red}\right)$ References

 Stuart, Alan, and Ord, Keith. Kendall's Advanced Theory of Statistics. 6th ed. London: Edward Arnold, 1998. Vol. 1: Distribution Theory.
 Walker, Alastair J. New Fast Method for Generating Discrete Random Numbers with Arbitrary Frequency Distributions, Electronic Letters, 10, 127-128.
 Walker, Alastair J. An Efficient Method for Generating Discrete Random Variables with General Distributions, ACM Trans. Math. Software, 3, 253-256.

Compatibility

 • The rng and out parameters were introduced in Maple 15.
 • The method option was introduced in Maple 15.