remove data items based on density - Maple Help

Home : Support : Online Help : Statistics : Statistics Package : Data Manipulation : Statistics/Excise

Statistics[Excise] - remove data items based on density

 Calling Sequence Excise(p, X, Y, Z, options)

Parameters

 p - fraction of data points to be removed from the sample X - first data sample Y - (optional) second data sample Z - (optional) third data sample options - (optional) equation(s) of the form option=value where option is one of to_plot, or return_same; specify options for the Excise function

Description

 • The Excise command calculates how densely clustered every point of data is with respect to every other point of data. A certain number of points, determined by the magnitude of p, are removed from the data and the remaining points are returned as an expression sequence of one dimensional Arrays.  Excise returns the same number of data samples as are passed into the function.
 • If p is a positive number between zero and one, and n is the number of points passed to Excise, then p*n of the least densely clustered points are excised, and (1-p)*n of the most densely clustered points will be returned.
 • If p is a negative number between zero and negative one, and n is the number of points passed to Excise, then (-p)*n of the most densely clustered points are excised, and (1+p)*n of the least densely clustered points will be returned.
 • The parameters X, Y and Z are the data samples to be excised. Each can be given as a Vector, Matrix, Array, or list, though they do not all have to be of the same type. They also do not need to be one dimensional, but will be treated as though they are. The first data sample, X, is required, but the second and third data samples, Y and Z respectively, are optional. Note that all data samples must have the same number of elements.
 • This function is part of the Statistics package, so it can be used in the short form Excise(..) only after executing the command with(Statistics).  However, it can always be accessed through the long form of the command by using Statistics[Excise](..).

Examples

 > $\mathrm{with}\left(\mathrm{Statistics}\right):$

A simple 1D case. Excise will remove the sparsest half of the data, leaving the densest half, which it returns as a 1D Array. In this case, this will the center four points.

 > $\mathrm{data1}:=\mathrm{Array}\left(\left[\left[1,2\right],\left[3,4\right],\left[5,6\right],\left[7,8\right]\right]\right):$
 > $\mathrm{ret1}:=\mathrm{Excise}\left(0.5,\mathrm{data1}\right)$
 ${\mathrm{ret1}}{:=}\left[\begin{array}{cccc}{4.}& {5.}& {3.}& {6.}\end{array}\right]$ (1)
 > $\mathrm{type}\left(\mathrm{ret1},\mathrm{Array}\right)$
 ${\mathrm{true}}$ (2)

If a negative fraction is used as the first argument, then the returned data will be the sparsest points, in this case the outer four points.

 > $\mathrm{Excise}\left(-0.5,\mathrm{data1}\right)$
 $\left[\begin{array}{cccc}{8.}& {1.}& {7.}& {2.}\end{array}\right]$ (3)

If the return_same option is used, then Excise will return the remaining data as the same type as was entered.

 > $\mathrm{data2}:=\left[2,4,6,8,10,12,14,16,18,20,22,24\right]$
 ${\mathrm{data2}}{:=}\left[{2}{,}{4}{,}{6}{,}{8}{,}{10}{,}{12}{,}{14}{,}{16}{,}{18}{,}{20}{,}{22}{,}{24}\right]$ (4)
 > $\mathrm{ret2}:=\mathrm{Excise}\left(\frac{2}{3},\mathrm{data2},\mathrm{return_same}\right)$
 ${\mathrm{ret2}}{:=}\left[{14.}{,}{12.}{,}{10.}{,}{16.}\right]$ (5)
 > $\mathrm{type}\left(\mathrm{ret2},\mathrm{list}\right)$
 ${\mathrm{true}}$ (6)

Excise can be used to trim points from data and then pass the remainders to a plotting function. If the to_plot option is used, then the original range of the data will be preserved so it can be compared with the original data. This is accomplished by returning a line of the form view= [range(s)] to be used as an option by the plotting function.

 > $\mathrm{with}\left(\mathrm{Statistics}\right):$
 > $A:=\mathrm{Sample}\left(\mathrm{RandomVariable}\left(\mathrm{Normal}\left(0,1\right)\right),500\right):$
 > $B:=\mathrm{Sample}\left(\mathrm{RandomVariable}\left(\mathrm{Normal}\left(0,1\right)\right),500\right):$

Plot original data

 > $\mathrm{ScatterPlot}\left(A,B\right)$

Plot of the densest half of the data

 > $\mathrm{ScatterPlot}\left(\mathrm{Excise}\left(0.5,A,B,\mathrm{to_plot}\right)\right)$