Statistics[Sample]  generate random sample

Calling Sequence


Sample(X, n, opts)
Sample(X, m, opts)
Sample(X, rng, opts)
Sample(X, out, opts)
Sample(X, opts)


Parameters


X



algebraic; random variable or distribution

n



nonnegative integer; sample size

m



list of two nonnegative integers; Matrix dimensions

rng



integer range or list of integer ranges; Array dimensions

out



float rtable; to be filled with data

options



(optional) equations of the form option = value, where option is method, possibly indexed by a name; specify options for sample generation





Options


•

method = name or method = list  This option can be used to select a method of generating the sample. There are four main choices: method = default, method = custom, method = discrete, and method = envelope. One can supply methodspecific options by instead specifying a list, the first element of which is one of the names default, custom, discrete, and envelope, and the other elements are equations; for example, method = [envelope, updates=20, range=0..100]. These methodspecific options will be explained below.

–

method = envelope uses an implementation of acceptance/rejection generation with an adaptive piecewise linear envelope, applicable to continuous distributions. This implementation will only work for distributions where on its support, the PDF is twice differentiable, has a continuous first derivative, and has only finitely many inflection points. There are three valid methodspecific options: range, basepoints, and updates.


basepoints: The base points are the boundaries between the segments of the piecewise linear envelope, which should include all inflection points of the PDF of the distribution. If basepoints = deduce (the default), then Maple attempts to find all inflection points itself. Otherwise, basepoints should be a list of floating point real numbers which includes all inflection points.


updates: The envelope is automatically refined as more numbers are generated; the maximal number of segments is given by this option, which should be a positive integer. The default value is .

–

method = discrete uses an implementation of the alias method by Walker (see references below), applicable to discrete distributions. Because this method computes and stores the individual probabilities for all possible outcomes within the range (see below), it may be inefficient for distributions with very heavy tails. There is one methodspecific option: range.


range: The (finite) range of integers for which the probabilities are computed. If the distribution uses the DiscreteValueMap feature (this is the case if the distribution can attain noninteger values), then this describes the range of source values; the map is applied to these integers to obtain the resulting values.

–

method = custom uses a distributionspecific method. Almost all predefined distributions have a highly efficient custom implementation in external C code. Methodspecific options are all ignored.

–

method = default (which is the default) selects one of the other three methods. For most builtin distributions, it selects method = custom. For other distributions, such as customdefined ones, the system falls back to either using method = envelope (for continuous distributions) or using method = discrete (for discrete distributions). The methodspecific options accepted are the same as for the applicable fallback method, and they are only used in case the system falls back to that generator.



Efficiency


•

When implementing an algorithm that uses a large number of random samples, it can be worthwhile to think about efficiency of the random sample generation. In most cases, the best efficiency is achieved when all samples are generated at once in a preprocessing phase and stored in a Vector (using the first calling sequence, above), and the values are then used one by one in the algorithm.


In some cases, however, this is not possible. For example, this might take too much memory (if a very large number of samples is needed), it might be difficult or impossible to predict the number of samples needed, or the parameters of the random variable might change during the algorithm. In the first two cases, the recommended strategy is to use the fourth calling sequence to create a procedure p, then use p to create a Vector that can hold a large number of samples (using, say, ), using the elements of v one by one, and calling to refill v when the samples run out. If the parameters of the random variable keep changing, then one can define the random variable with parameters that are unassigned initially, use the fourth calling sequence to create a procedure p, then assign values to the parameters afterwards. An example is provided below.

•

For some of the discrete distributions, the method selected by default is not method = custom but method = discrete. For these distributions, this method is faster when generating more than about 1000 random numbers. If you need to generate fewer random numbers, you can select method = custom by including that option explicitly.



Description


•

The Sample command generates a random sample drawn from the distribution given by X.

•

In the first calling sequence, the second parameter, n, is the sample size. This calling sequence will return a newly created Vector of length n, filled with the sample values. This calling sequence, or one of the next two, is recommended for all cases where there are no great performance concerns.

•

In the second calling sequence, the second parameter, m, is a list of two nonnegative integers. This calling sequence will return a newly created Matrix with the specified dimensions, filled with the sample values.

•

In the third calling sequence, the second parameter, rng, is a range or a list of ranges determining the dimensions of an Array. This Array will be created, filled with the sample values, and returned.

•

In the fourth calling sequence, the second parameter, out, is an rtable (such as a Vector) that was created beforehand. Upon successful return of the Sample command, out will have been filled with the sample values.


out needs to have rectangular storage and the float data type that is consistent with the current settings of Digits and UseHardwareFloats. That is, if either UseHardwareFloats = true, or UseHardwareFloats = deduced and Digits <= evalhf(Digits) (which is the default), then out needs to have datatype = float[8]; in the other case, that is, if either UseHardwareFloats = false, or UseHardwareFloats = deduced and Digits > evalhf(Digits), then out needs to have datatype = sfloat. This can easily be achieved by supplying the option datatype = float to the rtable creation function; this will automatically select the correct data type for the current settings.

•

In the fourth calling sequence, Sample returns a procedure p, which can subsequently be called to generate samples of X repeatedly. The procedure p accepts a single argument, which can be n, m, rng, or out, and then behaves as if one of the first three calling sequences were called. p does not accept options; any options should be given in the call to Sample itself.



Compatibility


•

The rng and out parameters were introduced in Maple 15.

•

The method option was introduced in Maple 15.

•

The m parameter was introduced in Maple 16.

•

The method option was updated in Maple 16.



Examples


>


Straightforward sampling of a distribution.
>


 (1) 
>


>


>


>


We can also sample an expression involving two random variables.
>


 (2) 
>


>


>


>


Sampling of a customdefined distribution.
>


 (3) 
>


>


>


>


If we supply a list of ranges instead of a number, we get an Array. With a list of two numbers, we get a Matrix.
>


 (4) 
>


 (5) 
We can use envelope rejection sampling to restrict and to a certain range.
>


 (6) 
>


Or to restrict only to a certain range.
>


 (7) 
>


We can refill with different samples as follows.
>


 (8) 
>


Another option is to use a procedure.
>


 (9) 
>


 (10) 
>


Sampling of a customdefined discrete distribution with noninteger values. This distribution attains the value with probability for positive .
>


 (11) 
>


 (12) 
>


 (13) 
Finally, here is a somewhat longer example, where we want to generate exponentially distributed numbers; the rate parameter starts as being , but for each subsequent value it is the square root of the previous sample value. In order to be able to use a procedure (important for efficiency), we need to make sure that is not defined when we create the procedure, otherwise it will only generate samples for the value that had at the time of definition. (If has a value, it can be undefined by executing lambda := 'lambda';, but since we have not used yet, that should not be necessary in this case.)
>


 (14) 
>


 (15) 
If we now compute a sample of , then Maple will complain, because is unassigned:
>


Instead, we assign to and start an iteration.
>


 (16) 
>


 (17) 
>


 (18) 
>


We now create a point plot where pairs of subsequent samples are the horizontal and vertical coordinates.
>




References



Stuart, Alan, and Ord, Keith. Kendall's Advanced Theory of Statistics. 6th ed. London: Edward Arnold, 1998. Vol. 1: Distribution Theory.


Walker, Alastair J. New Fast Method for Generating Discrete Random Numbers with Arbitrary Frequency Distributions, Electronic Letters, 10, 127128.


Walker, Alastair J. An Efficient Method for Generating Discrete Random Variables with General Distributions, ACM Trans. Math. Software, 3, 253256.


