generate random sample - Maple Help

Home : Support : Online Help : Statistics : Statistics Package : Simulation : Statistics/Sample

Statistics[Sample] - generate random sample

 Calling Sequence Sample(X, n, opts) Sample(X, m, opts) Sample(X, rng, opts) Sample(X, out, opts) Sample(X, opts)

Parameters

 X - algebraic; random variable or distribution n - nonnegative integer; sample size m - list of two nonnegative integers; Matrix dimensions rng - integer range or list of integer ranges; Array dimensions out - float rtable; to be filled with data options - (optional) equations of the form option = value, where option is method, possibly indexed by a name; specify options for sample generation

Efficiency

 • When implementing an algorithm that uses a large number of random samples, it can be worthwhile to think about efficiency of the random sample generation. In most cases, the best efficiency is achieved when all samples are generated at once in a preprocessing phase and stored in a Vector (using the first calling sequence, above), and the values are then used one by one in the algorithm.
 In some cases, however, this is not possible. For example, this might take too much memory (if a very large number of samples is needed), it might be difficult or impossible to predict the number of samples needed, or the parameters of the random variable might change during the algorithm. In the first two cases, the recommended strategy is to use the fourth calling sequence to create a procedure p, then use p to create a Vector that can hold a large number of samples (using, say, $v:=p\left({10}^{5}\right)$), using the elements of v one by one, and calling $p\left(v\right)$ to refill v when the samples run out. If the parameters of the random variable keep changing, then one can define the random variable with parameters that are unassigned initially, use the fourth calling sequence to create a procedure p, then assign values to the parameters afterwards. An example is provided below.
 • For some of the discrete distributions, the method selected by default is not method = custom but method = discrete. For these distributions, this method is faster when generating more than about 1000 random numbers. If you need to generate fewer random numbers, you can select method = custom by including that option explicitly.

Description

 • The Sample command generates a random sample drawn from the distribution given by X.
 • The first parameter, X, can be a distribution (see Statistics[Distribution]), a random variable, or an algebraic expression involving random variables (see Statistics[RandomVariable]).
 • In the first calling sequence, the second parameter, n, is the sample size. This calling sequence will return a newly created Vector of length n, filled with the sample values. This calling sequence, or one of the next two, is recommended for all cases where there are no great performance concerns.
 • In the second calling sequence, the second parameter, m, is a list of two nonnegative integers. This calling sequence will return a newly created Matrix with the specified dimensions, filled with the sample values.
 • In the third calling sequence, the second parameter, rng, is a range or a list of ranges determining the dimensions of an Array. This Array will be created, filled with the sample values, and returned.
 • In the fourth calling sequence, the second parameter, out, is an rtable (such as a Vector) that was created beforehand. Upon successful return of the Sample command, out will have been filled with the sample values.
 out needs to have rectangular storage and the float data type that is consistent with the current settings of Digits and UseHardwareFloats. That is, if either UseHardwareFloats = true, or UseHardwareFloats = deduced and Digits <= evalhf(Digits) (which is the default), then out needs to have datatype = float[8]; in the other case, that is, if either UseHardwareFloats = false, or UseHardwareFloats = deduced and Digits > evalhf(Digits), then out needs to have datatype = sfloat. This can easily be achieved by supplying the option datatype = float to the rtable creation function; this will automatically select the correct data type for the current settings.
 • In the fourth calling sequence, Sample returns a procedure p, which can subsequently be called to generate samples of X repeatedly. The procedure p accepts a single argument, which can be n, m, rng, or out, and then behaves as if one of the first three calling sequences were called. p does not accept options; any options should be given in the call to Sample itself.

Examples

 > $\mathrm{with}\left(\mathrm{Statistics}\right):$

Straightforward sampling of a distribution.

 > $X:=\mathrm{RandomVariable}\left(\mathrm{Normal}\left(0,1\right)\right)$
 ${X}{:=}{\mathrm{_R}}$ (1)
 > $A:=\mathrm{Sample}\left(X,{10}^{6}\right):$
 > $P:=\mathrm{DensityPlot}\left(X,\mathrm{range}=-2..2,\mathrm{thickness}=3,\mathrm{color}=\mathrm{red}\right):$
 > $Q:=\mathrm{Histogram}\left(A,\mathrm{range}=-2..2\right):$
 > $\mathrm{plots}[\mathrm{display}]\left(P,Q\right)$

We can also sample an expression involving two random variables.

 > $Y:=\mathrm{RandomVariable}\left(\mathrm{Normal}\left(0,1\right)\right)$
 ${Y}{:=}{\mathrm{_R0}}$ (2)
 > $B:=\mathrm{Sample}\left(\frac{X}{Y},{10}^{6}\right):$
 > $P:=\mathrm{DensityPlot}\left(\mathrm{Cauchy}\left(0,1\right),\mathrm{range}=-2..2,\mathrm{thickness}=3,\mathrm{color}=\mathrm{red}\right):$
 > $Q:=\mathrm{Histogram}\left(B,\mathrm{range}=-2..2\right):$
 > $\mathrm{plots}[\mathrm{display}]\left(P,Q\right)$

Sampling of a custom-defined distribution.

 > $\mathrm{dist}:=\mathrm{Distribution}\left(\mathrm{PDF}=\left(t→\mathrm{piecewise}\left(t<0,0,t<1,\frac{36{t}^{2}\left(\frac{4}{3}-t\right)}{7},0\right)\right)\right)$
 ${\mathrm{dist}}{:=}{\mathbf{module}}\left({}\right)\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{option}}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathrm{Distribution}}{,}{\mathrm{Continuous}}{;}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{export}}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathrm{Conditions}}{,}{\mathrm{PDF}}{;}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{end module}}$ (3)
 > $B:=\mathrm{Sample}\left(\mathrm{RandomVariable}\left(\mathrm{dist}\right),{10}^{6}\right):$
 > $P:=\mathrm{DensityPlot}\left(\mathrm{dist},\mathrm{range}=0..1,\mathrm{thickness}=3,\mathrm{color}=\mathrm{red}\right):$
 > $Q:=\mathrm{Histogram}\left(B\right):$
 > $\mathrm{plots}[\mathrm{display}]\left(P,Q\right)$

If we supply a list of ranges instead of a number, we get an Array. With a list of two numbers, we get a Matrix.

 > $a:=\mathrm{Sample}\left(\frac{X}{Y},\left[1..1000,1..1000\right]\right)$
 ${a}{:=}\left[\begin{array}{c}{\mathrm{1..1000 x 1..1000}}{\mathrm{Array}}\\ {\mathrm{Data Type:}}{{\mathrm{float}}}_{{8}}\\ {\mathrm{Storage:}}{\mathrm{rectangular}}\\ {\mathrm{Order:}}{\mathrm{Fortran_order}}\end{array}\right]$ (4)
 > $m:=\mathrm{Sample}\left(\frac{X}{Y},\left[3,3\right]\right)$
 ${m}{:=}\left[\begin{array}{ccc}{-}{1.24368664809680}& {-}{2.42811511057060}& {0.657320353969448}\\ {-}{48.5719748690776}& {1.79925030210058}& {-}{0.653960998618751}\\ {18.4816750590553}& {-}{0.545042157182995}& {0.549903521004792}\end{array}\right]$ (5)

We can use envelope rejection sampling to restrict $X$ and $Y$ to a certain range.

 > $\mathrm{s1}:=\mathrm{Sample}\left(\frac{X}{Y},{10}^{6},\mathrm{method}=\left[\mathrm{envelope},\mathrm{range}=0..1\right]\right)$
 ${\mathrm{s1}}{:=}\left[\begin{array}{c}{\mathrm{1 .. 1000000}}{{\mathrm{Vector}}}_{{\mathrm{row}}}\\ {\mathrm{Data Type:}}{{\mathrm{float}}}_{{8}}\\ {\mathrm{Storage:}}{\mathrm{rectangular}}\\ {\mathrm{Order:}}{\mathrm{Fortran_order}}\end{array}\right]$ (6)
 > $\mathrm{Histogram}\left(\mathrm{s1},\mathrm{range}=0..10\right)$

Or to restrict only $X$ to a certain range.

 > $\mathrm{s2}:=\mathrm{Sample}\left(\frac{X}{Y},{10}^{6},{\mathrm{method}}_{X}=\left[\mathrm{envelope},\mathrm{range}=0..1\right]\right)$
 ${\mathrm{s2}}{:=}\left[\begin{array}{c}{\mathrm{1 .. 1000000}}{{\mathrm{Vector}}}_{{\mathrm{row}}}\\ {\mathrm{Data Type:}}{{\mathrm{float}}}_{{8}}\\ {\mathrm{Storage:}}{\mathrm{rectangular}}\\ {\mathrm{Order:}}{\mathrm{Fortran_order}}\end{array}\right]$ (7)
 > $\mathrm{Histogram}\left(\mathrm{s2},\mathrm{range}=-10..10\right)$

We can refill $\mathrm{s2}$ with different samples as follows.

 > $\mathrm{Sample}\left(\mathrm{Cauchy}\left(0,1\right),\mathrm{s2}\right)$
 $\left[\begin{array}{c}{\mathrm{1 .. 1000000}}{{\mathrm{Vector}}}_{{\mathrm{row}}}\\ {\mathrm{Data Type:}}{{\mathrm{float}}}_{{8}}\\ {\mathrm{Storage:}}{\mathrm{rectangular}}\\ {\mathrm{Order:}}{\mathrm{Fortran_order}}\end{array}\right]$ (8)
 > $\mathrm{Histogram}\left(\mathrm{s2},\mathrm{range}=-10..10\right)$

Another option is to use a procedure.

 > $p:=\mathrm{Sample}\left(X\right)$
 ${p}{:=}{\mathbf{proc}}\left({n}{::}{\mathrm{Sample:-sizeType}}\right)\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{...}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{end proc}}$ (9)
 > $p\left(\mathrm{s2}\right)$
 $\left[\begin{array}{c}{\mathrm{1 .. 1000000}}{{\mathrm{Vector}}}_{{\mathrm{row}}}\\ {\mathrm{Data Type:}}{{\mathrm{float}}}_{{8}}\\ {\mathrm{Storage:}}{\mathrm{rectangular}}\\ {\mathrm{Order:}}{\mathrm{Fortran_order}}\end{array}\right]$ (10)
 > $\mathrm{Histogram}\left(\mathrm{s2}\right)$

Sampling of a custom-defined discrete distribution with non-integer values. This distribution attains the value ${\left(\frac{3}{2}\right)}^{n}$ with probability ${2}^{-n}$ for positive $n$.

 > $\mathrm{dist2}:=\mathrm{Distribution}\left(\mathrm{Type}=\mathrm{discrete},\mathrm{ProbabilityFunction}=\left(t→{2}^{-{\mathrm{log}}_{\frac{3}{2}}\left(t\right)}\right),\mathrm{DiscreteValueMap}=\left(n→{\left(\frac{3}{2}\right)}^{n}\right),\mathrm{Support}=1..\mathrm{∞}\right)$
 ${\mathrm{dist2}}{:=}{\mathbf{module}}\left({}\right)\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{option}}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathrm{Distribution}}{,}{\mathrm{Discrete}}{;}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{export}}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathrm{ProbabilityFunction}}{,}{\mathrm{Conditions}}{,}{\mathrm{DiscreteValueMap}}{,}{\mathrm{Support}}{;}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{end module}}$ (11)
 > $s:=\mathrm{Sample}\left(\mathrm{dist2},{10}^{5}\right)$
 ${s}{:=}\left[\begin{array}{c}{\mathrm{1 .. 100000}}{{\mathrm{Vector}}}_{{\mathrm{row}}}\\ {\mathrm{Data Type:}}{{\mathrm{float}}}_{{8}}\\ {\mathrm{Storage:}}{\mathrm{rectangular}}\\ {\mathrm{Order:}}{\mathrm{Fortran_order}}\end{array}\right]$ (12)
 > $\mathrm{sort}\left(\mathrm{Tally}\left(s\right)\right)$
 $\left[{1.50000000000000}{=}{49843}{,}{2.25000000000000}{=}{24962}{,}{3.37500000000000}{=}{12661}{,}{5.06250000000000}{=}{6234}{,}{7.59375000000000}{=}{3129}{,}{11.3906250000000}{=}{1592}{,}{17.0859375000000}{=}{766}{,}{25.6289062500000}{=}{395}{,}{38.4433593750000}{=}{226}{,}{57.6650390625000}{=}{94}{,}{86.4975585937500}{=}{43}{,}{129.746337890625}{=}{30}{,}{194.619506835938}{=}{13}{,}{291.929260253906}{=}{7}{,}{437.893890380859}{=}{4}{,}{7481.82764267921}{=}{1}\right]$ (13)

Finally, here is a somewhat longer example, where we want to generate exponentially distributed numbers; the rate parameter $\mathrm{\lambda }$ starts as being $1$, but for each subsequent value it is the square root of the previous sample value. In order to be able to use a procedure (important for efficiency), we need to make sure that $\mathrm{\lambda }$ is not defined when we create the procedure, otherwise it will only generate samples for the value that $\mathrm{\lambda }$ had at the time of definition. (If $\mathrm{\lambda }$ has a value, it can be undefined by executing lambda := 'lambda';, but since we have not used $\mathrm{\lambda }$ yet, that should not be necessary in this case.)

 > $X:=\mathrm{RandomVariable}\left(\mathrm{Exponential}\left(\mathrm{λ}\right)\right)$
 ${X}{:=}{\mathrm{_R6}}$ (14)
 > $p:=\mathrm{Sample}\left(X\right)$
 ${p}{:=}{\mathbf{proc}}\left({n}{::}{\mathrm{Sample:-sizeType}}\right)\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{...}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{end proc}}$ (15)

If we now compute a sample of $X$, then Maple will complain, because $\mathrm{\lambda }$ is unassigned:

 > $p\left(1\right)$

Instead, we assign $1$ to $\mathrm{\lambda }$ and start an iteration.

 > $\mathrm{λ}:=1$
 ${\mathrm{λ}}{:=}{1}$ (16)
 > $N:={10}^{3}$
 ${N}{:=}{1000}$ (17)
 > $v:=\mathrm{Vector}\left(N,'\mathrm{datatype}'='\mathrm{float}'\right)$
 ${v}{:=}\left[\begin{array}{c}{\mathrm{1 .. 1000}}{{\mathrm{Vector}}}_{{\mathrm{column}}}\\ {\mathrm{Data Type:}}{{\mathrm{float}}}_{{8}}\\ {\mathrm{Storage:}}{\mathrm{rectangular}}\\ {\mathrm{Order:}}{\mathrm{Fortran_order}}\end{array}\right]$ (18)
 > $\mathbf{for}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}i\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}\mathbf{to}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}N\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}\mathbf{do}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{v}_{i}:={p\left(1\right)}_{1};\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}\mathrm{λ}:=\sqrt{{v}_{i}}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}\mathbf{end do}:$

We now create a point plot where pairs of subsequent samples are the horizontal and vertical coordinates.

 > $\mathrm{plots}[\mathrm{pointplot}]\left(⟨{v}_{\left(\right)..-2}|{v}_{2..\left(\right)}⟩,\mathrm{axis}=\left[\mathrm{mode}=\mathrm{log}\right],\mathrm{symbolsize}=1,\mathrm{color}=\mathrm{red}\right)$