Statistics - Maple Programming Help

Online Help

All Products    Maple    MapleSim


Home : Support : Online Help : Statistics and Data Analysis : Statistics Package : Simulation : Statistics/Sample

Statistics

  

Sample

  

generate random sample

 

Calling Sequence

Parameters

Options

Efficiency

Description

Examples

References

Compatibility

Calling Sequence

Sample(X, n, opts)

Sample(X, m, opts)

Sample(X, rng, opts)

Sample(X, out, opts)

Sample(X, opts)

Parameters

X

-

algebraic; random variable or distribution

n

-

nonnegative integer; sample size

m

-

list of two nonnegative integers; Matrix dimensions

rng

-

integer range or list of integer ranges; Array dimensions

out

-

float rtable; to be filled with data

options

-

(optional) equations of the form option = value, where option is method, possibly indexed by a name; specify options for sample generation

Options

• 

method = name or method = list -- This option can be used to select a method of generating the sample. There are four main choices: method = default, method = custom, method = discrete, and method = envelope. One can supply method-specific options by instead specifying a list, the first element of which is one of the names default, custom, discrete, and envelope, and the other elements are equations; for example, method = [envelope, updates=20, range=0..100]. These method-specific options will be explained below.

– 

method = envelope uses an implementation of acceptance/rejection generation with an adaptive piecewise linear envelope, applicable to continuous distributions. This implementation will only work for distributions where on its support, the PDF is twice differentiable, has a continuous first derivative, and has only finitely many inflection points. There are three valid method-specific options: range, basepoints, and updates.

  

range: The (finite) range over which the piecewise linear envelope is to be defined, and consequently where the samples are to be found. If range = deduce (the default), then Maple takes the range given by the ϵ and 1ϵ Quantiles of the distribution, for some small positive value of ϵ that depends on the value of Digits. Otherwise, range should be a range of two real numbers, such as range = 0 .. 1.

  

basepoints: The base points are the boundaries between the segments of the piecewise linear envelope, which should include all inflection points of the PDF of the distribution. If basepoints = deduce (the default), then Maple attempts to find all inflection points itself. Otherwise, basepoints should be a list of floating point real numbers which includes all inflection points.

  

updates: The envelope is automatically refined as more numbers are generated; the maximal number of segments is given by this option, which should be a positive integer. The default value is 100.

– 

method = discrete uses an implementation of the alias method by Walker (see references below), applicable to discrete distributions. Because this method computes and stores the individual probabilities for all possible outcomes within the range (see below), it may be inefficient for distributions with very heavy tails. There is one method-specific option: range.

  

range: The (finite) range of integers for which the probabilities are computed. If the distribution uses the DiscreteValueMap feature (this is the case if the distribution can attain non-integer values), then this describes the range of source values; the map is applied to these integers to obtain the resulting values.

– 

method = custom uses a distribution-specific method. Almost all predefined distributions have a highly efficient custom implementation in external C code. Method-specific options are all ignored.

– 

method = default (which is the default) selects one of the other three methods. For most built-in distributions, it selects method = custom. For other distributions, such as custom-defined ones, the system falls back to either using method = envelope (for continuous distributions) or using method = discrete (for discrete distributions). The method-specific options accepted are the same as for the applicable fallback method, and they are only used in case the system falls back to that generator.

• 

If X is an algebraic expression involving multiple random variables, say R1 and R2, then one can specify different sample generation methods for R1 and R2 by using options methodR1=generator1 and methodR2=generator2, where generator1 and generator2 are sample generation methods that could be validly specified as method=generatori. If a random variable-specific sample generation method is given only for some of the random variables, the others will use the method given by the method=... option, or default if no such option is present.

Efficiency

• 

When implementing an algorithm that uses a large number of random samples, it can be worthwhile to think about efficiency of the random sample generation. In most cases, the best efficiency is achieved when all samples are generated at once in a preprocessing phase and stored in a Vector (using the first calling sequence, above), and the values are then used one by one in the algorithm.

  

In some cases, however, this is not possible. For example, this might take too much memory (if a very large number of samples is needed), it might be difficult or impossible to predict the number of samples needed, or the parameters of the random variable might change during the algorithm. In the first two cases, the recommended strategy is to use the fourth calling sequence to create a procedure p, then use p to create a Vector that can hold a large number of samples (using, say, vp105), using the elements of v one by one, and calling pv to refill v when the samples run out. If the parameters of the random variable keep changing, then one can define the random variable with parameters that are unassigned initially, use the fourth calling sequence to create a procedure p, then assign values to the parameters afterwards. An example is provided below.

• 

For some of the discrete distributions, the method selected by default is not method = custom but method = discrete. For these distributions, this method is faster when generating more than about 1000 random numbers. If you need to generate fewer random numbers, you can select method = custom by including that option explicitly.

Description

• 

The Sample command generates a random sample drawn from the distribution given by X.

• 

The first parameter, X, can be a distribution (see Statistics[Distribution]), a random variable, or an algebraic expression involving random variables (see Statistics[RandomVariable]).

• 

In the first calling sequence, the second parameter, n, is the sample size. This calling sequence will return a newly created Vector of length n, filled with the sample values. This calling sequence, or one of the next two, is recommended for all cases where there are no great performance concerns.

• 

In the second calling sequence, the second parameter, m, is a list of two nonnegative integers. This calling sequence will return a newly created Matrix with the specified dimensions, filled with the sample values.

• 

In the third calling sequence, the second parameter, rng, is a range or a list of ranges determining the dimensions of an Array. This Array will be created, filled with the sample values, and returned.

• 

In the fourth calling sequence, the second parameter, out, is an rtable (such as a Vector) that was created beforehand. Upon successful return of the Sample command, out will have been filled with the sample values.

  

out needs to have rectangular storage and the float data type that is consistent with the current settings of Digits and UseHardwareFloats. That is, if either UseHardwareFloats = true, or UseHardwareFloats = deduced and Digits <= evalhf(Digits) (which is the default), then out needs to have datatype = float[8]; in the other case, that is, if either UseHardwareFloats = false, or UseHardwareFloats = deduced and Digits > evalhf(Digits), then out needs to have datatype = sfloat. This can easily be achieved by supplying the option datatype = float to the rtable creation function; this will automatically select the correct data type for the current settings.

• 

In the fourth calling sequence, Sample returns a procedure p, which can subsequently be called to generate samples of X repeatedly. The procedure p accepts a single argument, which can be n, m, rng, or out, and then behaves as if one of the first three calling sequences were called. p does not accept options; any options should be given in the call to Sample itself.

Examples

withStatistics&colon;

Straightforward sampling of a distribution.

XRandomVariableNormal0&comma;1

X:=_R

(1)

ASampleX&comma;106&colon;

PDensityPlotX&comma;range&equals;2..2&comma;thickness&equals;3&comma;color&equals;red&colon;

QHistogramA&comma;range&equals;2..2&colon;

plots&lsqb;display&rsqb;P&comma;Q

We can also sample an expression involving two random variables.

YRandomVariableNormal0&comma;1

Y:=_R0

(2)

BSampleXY&comma;106&colon;

PDensityPlotCauchy0&comma;1&comma;range&equals;2..2&comma;thickness&equals;3&comma;color&equals;red&colon;

QHistogramB&comma;range&equals;2..2&colon;

plots&lsqb;display&rsqb;P&comma;Q

Sampling of a custom-defined distribution.

distDistributionPDF&equals;t&rarr;piecewiset<0&comma;0&comma;t<1&comma;36t243t7&comma;0

dist:=moduleoptionDistribution&comma;Continuous&semi;exportPDF&comma;Conditions&semi;end module

(3)

BSampleRandomVariabledist&comma;106&colon;

PDensityPlotdist&comma;range&equals;0..1&comma;thickness&equals;3&comma;color&equals;red&colon;

QHistogramB&colon;

plots&lsqb;display&rsqb;P&comma;Q

If we supply a list of ranges instead of a number, we get an Array. With a list of two numbers, we get a Matrix.

aSampleXY&comma;1..1000&comma;1..1000

a:= 1..1000 x 1..1000 ArrayData Type: float8Storage: rectangularOrder: Fortran_order

(4)

mSampleXY&comma;3&comma;3

m:=1.243686648096802.428115110570600.65732035396944848.57197486907761.799250302100580.65396099861875118.48167505905530.5450421571829950.549903521004792

(5)

We can use envelope rejection sampling to restrict X and Y to a certain range.

s1SampleXY&comma;106&comma;method&equals;envelope&comma;range&equals;0..1

s1:= 1 .. 1000000 VectorrowData Type: float8Storage: rectangularOrder: Fortran_order

(6)

Histograms1&comma;range&equals;0..10

Or to restrict only X to a certain range.

s2SampleXY&comma;106&comma;methodX&equals;envelope&comma;range&equals;0..1

s2:= 1 .. 1000000 VectorrowData Type: float8Storage: rectangularOrder: Fortran_order

(7)

Histograms2&comma;range&equals;10..10

We can refill s2 with different samples as follows.

SampleCauchy0&comma;1&comma;s2

1 .. 1000000 VectorrowData Type: float8Storage: rectangularOrder: Fortran_order

(8)

Histograms2&comma;range&equals;10..10

Another option is to use a procedure.

pSampleX

p:=procn::`Sample:-sizeType`...end proc

(9)

ps2

1 .. 1000000 VectorrowData Type: float8Storage: rectangularOrder: Fortran_order

(10)

Histograms2

Sampling of a custom-defined discrete distribution with non-integer values. This distribution attains the value 32n with probability 2n for positive n.

dist2DistributionType&equals;discrete&comma;ProbabilityFunction&equals;t&rarr;2log&lsqb;32&rsqb;t&comma;DiscreteValueMap&equals;n&rarr;32n&comma;Support&equals;1..&infin;

dist2:=moduleoptionDistribution&comma;Discrete&semi;exportConditions&comma;ProbabilityFunction&comma;Support&comma;DiscreteValueMap&semi;end module

(11)

sSampledist2&comma;105

s:= 1 .. 100000 VectorrowData Type: float8Storage: rectangularOrder: Fortran_order

(12)

sortTallys

1.50000000000000&equals;49843&comma;2.25000000000000&equals;24962&comma;3.37500000000000&equals;12661&comma;5.06250000000000&equals;6234&comma;7.59375000000000&equals;3129&comma;11.3906250000000&equals;1592&comma;17.0859375000000&equals;766&comma;25.6289062500000&equals;395&comma;38.4433593750000&equals;226&comma;57.6650390625000&equals;94&comma;86.4975585937500&equals;43&comma;129.746337890625&equals;30&comma;194.619506835938&equals;13&comma;291.929260253906&equals;7&comma;437.893890380859&equals;4&comma;7481.82764267921&equals;1

(13)

Finally, here is a somewhat longer example, where we want to generate exponentially distributed numbers; the rate parameter λ starts as being 1, but for each subsequent value it is the square root of the previous sample value. In order to be able to use a procedure (important for efficiency), we need to make sure that λ is not defined when we create the procedure, otherwise it will only generate samples for the value that λ had at the time of definition. (If λ has a value, it can be undefined by executing lambda := 'lambda';, but since we have not used λ yet, that should not be necessary in this case.)

XRandomVariableExponential&lambda;

X:=_R6

(14)

pSampleX

p:=procn::`Sample:-sizeType`...end proc

(15)

If we now compute a sample of X, then Maple will complain, because λ is unassigned:

p1

Error, (in p) unable to evaluate lambda to floating-point

Instead, we assign 1 to λ and start an iteration.

&lambda;1

&lambda;:=1

(16)

N103

N:=1000

(17)

vVectorN&comma;&apos;datatype&apos;&equals;&apos;float&apos;

v:= 1 .. 1000 VectorcolumnData Type: float8Storage: rectangularOrder: Fortran_order

(18)

foritoNdovip11&semi;&lambda;viend do&colon;

We now create a point plot where pairs of subsequent samples are the horizontal and vertical coordinates.

plots&lsqb;pointplot&rsqb;v..2&verbar;v2..&comma;axis&equals;mode&equals;log&comma;symbolsize&equals;1&comma;color&equals;red

References

  

Stuart, Alan, and Ord, Keith. Kendall's Advanced Theory of Statistics. 6th ed. London: Edward Arnold, 1998. Vol. 1: Distribution Theory.

  

Walker, Alastair J. New Fast Method for Generating Discrete Random Numbers with Arbitrary Frequency Distributions, Electronic Letters, 10, 127-128.

  

Walker, Alastair J. An Efficient Method for Generating Discrete Random Variables with General Distributions, ACM Trans. Math. Software, 3, 253-256.

Compatibility

• 

The rng and out parameters were introduced in Maple 15.

• 

The method option was introduced in Maple 15.

• 

For more information on Maple 15 changes, see Updates in Maple 15.

• 

The m parameter was introduced in Maple 16.

• 

The method option was updated in Maple 16.

• 

For more information on Maple 16 changes, see Updates in Maple 16.

See Also

Statistics

Statistics[Computation]

Statistics[Distributions]

Statistics[RandomVariables]

 


Download Help Document

Was this information helpful?



Please add your Comment (Optional)
E-mail Address (Optional)
What is ? This question helps us to combat spam