
Calling Sequence


LeastTrimmedSquares(X, Y, v)
LeastTrimmedSquares(XY, v)
LeastTrimmedSquares(X, Y, v, opts)
LeastTrimmedSquares(XY, v, opts)


Parameters


X



values of independent variable(s)

Y



values of dependent variable

XY



values of independent and dependent variables

v



algebraic expression in which to express the result, or list or Vector of values of the independent variable(s) at which to evaluate the computed estimator

opts



(optional) one or more equations of the form include = h, intercept = tf, or subsets = s





Description


•

The LeastTrimmedSquares function computes a robust linear estimator from a collection of input data. It is a quantity first described by Rousseeuw [1], and is computed using the FASTLTS algorithm by Rousseeuw and Van Driessen [2]. The article [1] that introduced least trimmed squares regression also introduced least median of squares regression, but least trimmed squares regression was later considered the better choice by many, including the original author  see e.g. [2].

•

Least trimmed squares regression yields a robust estimator. This means that it will continue to perform well if some points are replaced by outliers. Leastsquares regression, the type of regression most commonly used and implemented by LinearFit and NonlinearFit, is very susceptible to outliers.

•

Conceptually, one performs least trimmed squares regression by selecting an arbitrary subset of h of the input data points, where h is an input parameter (which can be set using the include option). One then performs regular leastsquares regression on these h data points, and records the sum of squared residuals. This sum of squared residuals is now minimized over all subsets of h data points.


The FASTLTS algorithm implemented in this command is an approximation algorithm: it will in all reasonable cases return the optimal regression model, but this is not guaranteed. An exact algorithm that does guarantee the optimal result can be selected with the subsets = all option, but this algorithm will take a number of steps that grows very quickly with the size of the problem.

•

Least trimmed squares regression with parameter h on n input points has a breakdown point of h / n. This means that when fewer than n  h of the data points in a sample are changed, the result of least trimmed squares regression can only change a limited amount. See the discussion of the include option below for limits on the values h can assume.

•

In the first and third calling sequences, the parameter X is a Matrix containing the values of the independent variables; each row corresponds to one data point and each column corresponds to one independent variable. Alternatively, you can specify X as a list of lists. If there is only a single independent variable, you can also specify X as a column Vector or a list.


In those same calling sequences, Y is a Vector containing the values of the dependent variable, in the same order as the data points occur in X. Alternatively, you can specify Y as a list, or a Matrix with a single column.

•

In the second and fourth calling sequences, XY is a Matrix containing both the dependent and independent data, with the dependent data being in the rightmost column.

•

The returned value is an expression representing the estimator evaluated at the value v. By supplying a variable name here, you obtain the general expression for the estimator in terms of variables v[1], v[2], etc., corresponding to the independent variables. In particular, if there is a single independent variable, the result will be in terms of v[1]. If instead you supply a list (or Vector) of values, corresponding to the values of the independent variables, you obtain the value of the estimator at those values.



Options



This option determines how many data points are included (and consequently, how many are ignored). The value h should either be an integer greater than 1, in which case it is directly used as the number of data points to include, or a number that is less than or equal to one, in which case it is multiplied by n (the number of data points) and rounded to obtain the number of data points to include.


In all cases, the resulting number of data points to include (which will be denoted by h in the rest of this discussion) needs to be at least equal to (n + p + 1)/2, where n is the number of data points and p is the number of independent variables, including (if present) the intercept. The default value for h is this minimum number (rounded up to an integer).

If you know that the number of outliers in your data is less than some number m, and n  m > (n + p + 1)/2, then you can supply the option include = n  m to get better statistical efficiency.

This option determines if a column for the intercept (a constant term in the resulting regression model) should be added to X. The default value is true; you can supply the option intercept = false to turn this off. The intercept is internally handled by adding a column filled with the value 1 to the right of X.


Using this option is necessary if there is already an independent variable assuming only one constant value, or if the allone vector is otherwise already in the column space of X: the matrix used internally needs to have full column rank, so it will not work if the allone vector is joined to a Matrix that already contains that vector in its column space.


This option can be used by experts to control the details of how the approximation process used for the FASTLTS algorithm works. The discussion of this option assumes knowledge of the Rousseeuw and Van Driessen paper [2] that describes the algorithm.


If p = 1 (after including the intercept), this command uses a fast exact algorithm and ignores the subsets option.


By submitting the subsets = all option, the user can select a variant that tries all psubsets of the data points. It runs two Csteps on all regression lines generated this way, selects the 10 best results, and runs Csteps on these results until convergence. This should in all cases return the optimal result, but the number of regression lines to process grows very quickly as a function of n and p. This algorithm is selected by default if binomial(n, p) < 1500.


Otherwise, the algorithm runs in a number of rounds in which candidate regression lines are generated and subjected to Csteps; the details of these depend by default on n but can be finetuned by submitting an option of the form subsets = s, where s is a list of lists. Each sublist corresponds to one round of the algorithm, and contains a number of options specified as equations, listed below.

–

subsets = c and size = s


These options specify that the data points should be subdivided into c disjoint subsets of size s each, all treated independently. Any points not included are ignored for the time being. The other options then apply to each subset; for example, the generate option specifies how many candidates are generated for each subset. The default values are c = 1 and s equal to the total number of data points. These options can only be specified for the first round.


This specifies how to obtain the candidates at the start of the round:

•

If g is a positive integer, Maple will randomly create that many candidates. This can only be specified for the first round.

•

If g is the value merge, then the data points selected in all subsets in previous rounds are merged, and the candidates found are merged as well.

•

If g is the value all, then all original data points are used, including those potentially ignored due to subsets and size options in the first round. The candidates found are merged as well.


This specifies the number of Csteps done on each candidate in this round. The default is 2.


This specifies that the top k candidates go on to the next round. This is the only required option: there is no default, because this option needs to be included for every round (if the subsets option is used at all).



Notes


•

The underlying computation is done in floatingpoint; therefore, all data points must have type realcons and all returned solutions are floatingpoint, even if the problem is specified with exact values. For more information about numeric computation in the Statistics package, see the Statistics/Computation help page.



Examples


In this example, we have 1000 data points. There is a single independent variable, $x$, with values uniformly distributed between 0 and 10. The dependent variable is a linear function of the independent variable plus some additive noise, $y=5x+10+\mathrm{noise}$, where the noise is from a probability distribution known to have severe outliers  the Cauchy distribution, with location parameter 0 and scale parameter 5.
>

x := Sample(Uniform(0, 10), 1000);

>

noise := Sample(Cauchy(0, 1), 1000);

>

y := (5*x + noise) +~ 10;

Here we see all data points:
>

plots[setoptions](size = [0.7, "golden"]):

>

pp := PointPlot(y, xcoords = x): pp;

Linear least squares regression will be severely affected by the outliers.
>

ls_regression_result := Fit(a * X + b, x, y, X);

${\mathrm{ls\_regression\_result}}{\u2254}{3.44970682383807}{}{X}{+}{10.6816568681413}$
 (4) 
>

ls_deviation_from_model := (coeff(ls_regression_result, X, 1)  5)^2 + (coeff(ls_regression_result, X, 0)  10)^2;

${\mathrm{ls\_deviation\_from\_model}}{\u2254}{2.86806501793850}$
 (5) 
Least trimmed squares regression gets much closer to the true line without noise.
>

lts_regression_result := LeastTrimmedSquares(x, y, [X]);

${\mathrm{lts\_regression\_result}}{\u2254}{5.03537530551575}{}{X}{+}{9.82475419561272}$
 (6) 
>

lts_deviation_from_model := (coeff(lts_regression_result, X, 1)  5)^2 + (coeff(lts_regression_result, X, 0)  10)^2;

${\mathrm{lts\_deviation\_from\_model}}{\u2254}{0.0319625041956780}$
 (7) 
The result is even better if we include 900 out of the 1000 points, instead of the default of a little over 500.
>

lts_900_regression_result := LeastTrimmedSquares(x, y, [X], include=900);

${\mathrm{lts\_900\_regression\_result}}{\u2254}{5.00862730339998}{}{X}{+}{10.0156318668695}$
 (8) 
>

lts_900_deviation_from_model := (coeff(lts_900_regression_result, X, 1)  5)^2 + (coeff(lts_900_regression_result, X, 0)  10)^2;

${\mathrm{lts\_900\_deviation\_from\_model}}{\u2254}{0.000318785625780919}$
 (9) 
The other robust regression method, implemented in the RepeatedMedianEstimator procedure, also gets a good result.
>

rme_regression_result := RepeatedMedianEstimator(x, y, X);

${\mathrm{rme\_regression\_result}}{\u2254}{10.0306661686300}{+}{5.00564125476873}{}{X}$
 (10) 
>

rme_deviation_from_model := (coeff(rme_regression_result, X, 1)  5)^2 + (coeff(rme_regression_result, X, 0)  10)^2;

${\mathrm{rme\_deviation\_from\_model}}{\u2254}{0.000972237653807886}$
 (11) 
In order to visualize these results, we show the same point plot as before, including the four regression lines. The three regression lines from robust methods cannot be distinguished, but the least squares method is clearly off. We zoom in on the vertical range that includes most points.
>

plots[display](pp, plot([ls_regression_result, lts_regression_result, lts_900_regression_result, rme_regression_result], X=0..10, legend=["Least squares", "Least trimmed squares", "Least trimmed squares (900 points)", "Repeated median estimator"]), view=[0..10, 10..110]);

In the next example, we have 1000 input points with two independent variables, most of which follow the model $z=5x3y+2+\mathrm{noise}$, where the noise is relatively small. However, 10 of the points are far away in both the dependent and independent directions (and they have much greater noise). Such cases are particularly hard for nonrobust methods to deal with.
We first see what this point cloud looks like, and then view the resulting fit.
>

xy := Sample(Normal(0, 1), [1000, 2]);

>

noise := Sample(Normal(0, 1), 1000);

>

xy[1..10] := Sample(Normal(0, 10), [10, 2]);

${{\mathrm{xy}}}_{{1}{..}{10}}{\u2254}\left[\begin{array}{cc}{\mathrm{7.25835457416460}}& {4.46546251109489}\\ {\mathrm{12.8116871270602}}& {6.95701976222452}\\ {5.99013830391918}& {\mathrm{0.0711220669337460}}\\ {5.75154152981187}& {\mathrm{24.7628908901018}}\\ {9.09548316783929}& {3.96336774375604}\\ {8.73959176796014}& {10.8028281369976}\\ {10.7436650987406}& {\mathrm{17.8756884460622}}\\ {4.35976236509024}& {\mathrm{11.7928910523180}}\\ {\mathrm{5.77171156203078}}& {4.33504635234648}\\ {2.59252474066801}& {\mathrm{10.1924397735107}}\end{array}\right]$
 (14) 
>

noise[1..10] := Sample(Normal(0, 100), 10);

${{\mathrm{noise}}}_{{1}{..}{10}}{\u2254}\left[\begin{array}{cccccccccc}{100.800809666339}& {8.96726548728334}& {\mathrm{46.8119150928935}}& {\mathrm{259.969027745269}}& {19.0856519868389}& {\mathrm{153.183461105396}}& {\mathrm{25.7014908594142}}& {94.1751893325151}& {75.9056085428043}& {\mathrm{120.552449126529}}\end{array}\right]$
 (15) 
>

z := (xy . <5, 3> +~ 2) + noise^%T;

>

pp := plots[pointplot3d](<xy  z>, color=black): pp;

>

ls_regression_result := Fit(a*X + b*Y + c, xy, z, [X, Y]);

${\mathrm{ls\_regression\_result}}{\u2254}{2.79396059575048}{}{X}{}{0.846671122546459}{}{Y}{+}{1.81658520523718}$
 (17) 
>

lts_regression_result := LeastTrimmedSquares(xy, z, [X, Y]);

${\mathrm{lts\_regression\_result}}{\u2254}{4.89975550872497}{}{X}{}{3.01061932803024}{}{Y}{+}{2.16309086233403}$
 (18) 
>

plots[display](pp, plot3d([ls_regression_result, lts_regression_result], X=2..2, Y=2..2, color=[red,blue]), view=[2..2, 2..2, 5..20], orientation=[25, 80, 60]);

Data sets up to significant size can be used in reasonable time. In the third example, we have 100000 points with 10 independent variables.
>

x := Sample(Uniform(0, 1), [100000, 10]);

>

y := x . <1, 2, 3, 4, 5, 6, 7, 8, 9, 10>;

>

y[1..1000] := y[1..1000] +~ 10;

>

CodeTools[Usage](LeastTrimmedSquares(x, y, X));

memory used=297.78MiB, alloc change=29.30MiB, cpu time=6.24s, real time=2.57s, gc time=3.04s
 
${1.00000000000000}{}{{X}}_{{1}}{+}{2.00000000000001}{}{{X}}_{{2}}{+}{3.00000000000002}{}{{X}}_{{3}}{+}{4.00000000000000}{}{{X}}_{{4}}{+}{4.99999999999998}{}{{X}}_{{5}}{+}{5.99999999999996}{}{{X}}_{{6}}{+}{7.00000000000002}{}{{X}}_{{7}}{+}{8.00000000000002}{}{{X}}_{{8}}{+}{8.99999999999999}{}{{X}}_{{9}}{+}{9.99999999999998}{}{{X}}_{{10}}{}{8.93746573290161}{}{{10}}^{{\mathrm{17}}}$
 (22) 


References



[1] Rousseeuw, Peter J. Least Median of Squares Regression. Journal of the American Statistical Association 79, 1984, pp.871880.


[2] Rousseeuw, Peter J., and Van Driessen, Katrien. Computing LTS Regression for Large Data Sets. Data Mining and Knowledge Discovery 12, 2006, pp.2945.



Compatibility


•

The Statistics[LeastTrimmedSquares] command was introduced in Maple 2019.



