Predictive Least Squares

All Products Maple MapleSim

Home : Support : Online Help : Statistics and Data Analysis : Statistics Package : Regression : Predictive Least Squares

Statistics

PredictiveLeastSquares

fit a linear model to data

	Calling Sequence
	PredictiveLeastSquares(A, B, v, options)

Parameters

A	-	Matrix; values of independent variables
B	-	Vector; values of dependent variable
v	-	name; (optional) independent variable name
options	-	(optional) equation(s) of the form option=value where option is one of samplesize, tolerance or numtrials

Description

•	The PredictiveLeastSquares command returns a list, P, and a vector, V that best satisfies the equation A[..,P] . x is approximately equal to B according to random trials using a subset of the data for fitting and the remaining data to test the goodness of the fit.

•

This command works best in situations where the problem is underspecified. That is, the number of variables, or columns in A is of the same order of magnitude or less than the number of observations or rows of A and B. The returned list, P contains the column index for the variables that have been tested to be most relevant, thus minimizing the effect of outliers and overfitting when using the model to predict new results.

•	A and B must contain numeric entries. A is a m x n Matrix, and B is a m x 1 vector.

Options

The options argument can contain one or more of the options shown below.

•

numtrials= integer -- Specify how many random subsamples should be used to determine which variables to drop at each phase of regression. After each sweep that causes one or more variables/columns to be removed, another phase consisting of the specified number of trials is performed. The default is numtrials=15.

•	tolerance = realcons(nonnegative) -- Set the tolerance that determines whether a fit coefficient can be considered insignificant, and therefore should be removed. This is a relative tolerance, compared to the largest coefficient. The default is 1e-10.

•

samplesize= realcons(nonnegative) -- Provide the fraction of data that will be used for building the model. A setting of samplesize=.7, will cause snapshots using 70% of the data to be used for fitting, and the remaining 30% to be used for testing. This must be a number between 0 and 1. The default is .55.

Notes

•

The underlying computation is done in floating-point; therefore, all data points must have type realcons and all returned solutions are floating-point, even if the problem is specified with exact values. For more information about numeric computation in the Statistics package, see the Statistics/Computation help page.

Examples

>	$with (Statistics) &colon;$

In this first example, we have a matrix, A, with 100 columns of data, but the data in B only really depends on the first 4 of those columns.

>	$A ≔ LinearAlgebra :- RandomMatrix (100, 100, datatype = float [8]) &colon;$

>	$B ≔ Vector (100, i \mapsto A [i, 1] + 0.1 \cdot A [i, 2] + 0.5 \cdot A [i, 3] - 0.3 \cdot A [i, 4]) &colon;$

The permutation vector computed shows the first 4 entries are relevant, and the coefficient vector, LSP, exactly matches the terms used to build B. All other columns not referenced by p can be discarded.

>	$p, LSP ≔ PredictiveLeastSquares (A, B)$

$p, LSP ≔ [1, 2, 3, 4], [\begin{array}{c} 1.00000000000000 \\ 0.0999999999999999 \\ 0.500000000000000 \\ −0.300000000000000 \end{array}]$

(1)

In this second example, we will create a result vector that depends on 10 variables, of which only 5 of them are measured in the matrix, A (along with 95 other measurements of irrelevant/random properties).

>	$numsamples ≔ 50 &colon;$

>	$numvariables ≔ 100 &colon;$

>	$Z ≔ LinearAlgebra :- RandomMatrix (numsamples, 10, datatype = float [8]) &colon;$

>	$A ≔ LinearAlgebra :- RandomMatrix (numsamples, numvariables, datatype = float [8]) &colon;$

>	$A [.., 1 .. 5] ≔ Z [.., 1 .. 5] &colon;$

>	$B ≔ Vector (numsamples, i \mapsto add (\frac{Z [i, j]}{j}, j = 1 .. 10)) &colon;$

Warning, (in anonymous procedure created in mpldoc/process_example) `j` is implicitly declared local

>	$p, LSP ≔ PredictiveLeastSquares (A, B) &colon;$

The notation A[..,p] will select all the rows of A and only the column indices found in the list p. This is the reduced matrix. Note the correlation of B and (A[..,p].LSP)

>	$Correlation (B, A [.., p] \cdot LSP)$

$0.992800745975950$

(2)

Compare this with the standard least squares fit.

>	$LS ≔ LinearAlgebra :- LeastSquares (A, B) &colon;$

>	$Correlation (B, A \cdot LS)$

$1.$

(3)

The correlation with the training data is a closer match using standard least squares, but let's see what happens when we use these models to predict results using new data.

>	$Z2 ≔ LinearAlgebra :- RandomMatrix (numsamples, 10, datatype = float [8]) &colon;$

>	$A2 ≔ LinearAlgebra :- RandomMatrix (numsamples, numvariables, datatype = float [8]) &colon;$

>	$A2 [.., 1 .. 5] ≔ Z2 [.., 1 .. 5] &colon;$

>	$GuessLS ≔ A2 \cdot LS &colon;$

>	$GuessLSP ≔ A2 [.., p] \cdot LSP &colon;$

>	$Actual ≔ Vector (numsamples, i \mapsto add (\frac{Z2 [i, j]}{j}, j = 1 .. 10)) &colon;$

Warning, (in anonymous procedure created in mpldoc/process_example) `j` is implicitly declared local

Note how the correlation of the new data is much better using the predictive model. The standard model suffers from overfitting.

>	$Correlation (GuessLS, Actual)$

$0.727378412013161$

(4)

>	$Correlation (GuessLSP, Actual)$

$0.959754787645568$

(5)

Compatibility

•	The Statistics[PredictiveLeastSquares] command was introduced in Maple 17.

•	For more information on Maple 17 changes, see Updates in Maple 17.

Maple

Maple Add-Ons

Math Success Platform

Improving Retention Rates

Maple Flow

MapleSim

Consulting Services

Maple T.A. and Möbius

Education

Industries

Automotive and Aerospace

Robotics

Machine Design & Industrial Automation

Other

Application Areas

Product Pricing

Purchasing

Institutional Student Licensing

Maplesoft Elite Maintenance (EMP)

Support

Product Training

Online Product Help

Webinars & Events

Publications

Content Hubs

Examples & Applications

Community

About Maplesoft

Media Center

User Community

Contact

Online Help

All Products Maple MapleSim

Maple

Powerful math software that is easy to use

Maple Add-Ons

Math Success Platform

Improving Retention Rates

Maple Flow

Engineering calculations & documentation

MapleSim

Advanced System Level Modeling

Consulting Services

Maple T.A. and Möbius

Education

Industries

Automotive and Aerospace

Robotics

Machine Design & Industrial Automation

Other

Application Areas

Product Pricing

Purchasing

Institutional Student Licensing

Maplesoft Elite Maintenance (EMP)

Support

Product Training

Online Product Help

Webinars & Events

Publications

Content Hubs

Examples & Applications

Community

About Maplesoft

Media Center

User Community

Contact

Online Help

All Products Maple MapleSim