Fit - Maple Help

All Products Maple MapleSim

Home : Support : Online Help : Statistics and Data Analysis : Statistics Package : Regression : Fit

Statistics

Fit

fit a model function to data

	Calling Sequence
	Fit(f, X, Y, v, options) Fit(f, XY, v, options)

Parameters

f	-	algebraic; model function
X	-	Vector or Matrix; values of independent variable(s)
Y	-	Vector; values of dependent variable
XY	-	Matrix; values of independent and dependent variables
v	-	name or list(names); name(s) of independent variables in the component functions
options	-	(optional) equation(s) of the form option=value where option is one of output, weights, summarize, or initialvalues; specify options for the Fit command

Description

•

The Fit command fits a model function to data by minimizing the least-squares error. Consider the model $y = f (x_{1}, x_{2}, ..., x_{n}; a_{1}, a_{2}, ..., a_{m})$ , where y is the dependent variable and f is the model function of n independent variables $x_{1}, x_{2}, ..., x_{n}$ , and m model parameters $a_{1}, a_{2}, ..., a_{m}$ . Given k data points, where each data point is an (n+1)-tuple of numerical values for $(x_{1}, x_{2}, ..., x_{n}, y)$ , the Fit command finds values of the model parameters such that the sum of the k residuals squared is minimized. The ith residual is the value of $y - f (x_{1}, x_{2}, ..., x_{n}; a_{1}, a_{1}, ..., a_{m})$ evaluated at the ith data point.

•	The first parameter f is an algebraic expression in the independent variables $x_{1}, x_{2}, ..., x_{n}$ and the model parameters $a_{1}, a_{2}, ..., a_{m}$ .

•

In the first calling sequence, the second parameter X is a Matrix containing the values of the independent variables. Row i in the Matrix contains the n values for the ith data point while column j contains all values of the single variable $x_{j}$ . If there is only one independent variable, X can be either a Vector or a k-by-1 Matrix. The third parameter Y is a Vector containing the k values of the dependent variable y.

•

In the second calling sequence, the second parameter XY is a Matrix containing the values of both the independent and the dependent variables, consisting of k rows and n + 1 columns. The first n columns correspond to X, the final column to Y. That is, the ith row of XY contains first the n values for the ith data point of variables $x_{1}$ through $x_{n}$ , then the value of the dependent variable y.

•	The parameters X, Y, and XY can also be specified as lists or Arrays; for details, see the Input Forms help page.

•	The parameter v is a list of the independent variable names used in f. If there is only one independent variable, then v can be a single name. The order of the names in the list must match exactly the order in which the independent variable values are placed in the columns of X.

•

The Fit command returns the model function, with the final parameter values, in terms of the independent variables. Additional results or a solution module that allows you to query for various settings and results can be obtained with the output option. For more information, see the Statistics/Regression/Solution help page.

•

The Fit command determines if the model function is linear or nonlinear in the model parameters. (Note that a model function can be nonlinear in the independent variables but linear in the model parameters.) It then calls either Statistics[LinearFit] or Statistics[NonlinearFit]. The most commonly used options are described below. Additional options accepted by LinearFit or NonlinearFit are passed directly to those commands. Note in particular the (recommended) use of the initialvalues option.

•	The Fit command accepts the model function only as an algebraic expression. Different input forms, allowing for greater flexibility and efficiency, are offered by the LinearFit and NonlinearFit commands. For more information, see the Input Forms help page.

Options

The options argument can contain one or more of the options shown below. These options are described in more detail on the Statistics/Regression/Options help page.

•

output = name or string -- Specify the form of the solution. The output option can take as a value the name solutionmodule, or one of the following names (or a list of these names): AtkinsonTstatistic, confidenceintervals, CookDstatistic, degreesoffreedom, externallystandardizedresiduals, internallystandardizedresiduals, leastsquaresfunction, leverages, parametervalues, parametervector, residuals, residualmeansquare, residualstandarddeviation, residualsumofsquares, rsquared, rsquaredadjusted, standarderrors, tprobability, tvalue, variancecovariancematrix. For more information, see the Statistics/Regression/Solution help page.

•	summarize = true, false, or embed -- Display a summary of the regression model. This option is only available when the model expression is linear in the parameters.

•	svdtolerance = realcons(nonnegative) -- Set the tolerance that determines whether a singular-value decomposition is performed.

•	weights = Vector -- Provide weights for the data points.

Notes

•

The Fit command uses various methods implemented in a built-in library provided by the Numerical Algorithms Group (NAG). The underlying computation is done in floating-point; therefore, all data points must have type realcons and all returned solutions are floating-point, even if the problem is specified with exact values. For more information about numeric computation in the Statistics package, see the Statistics/Computation help page.

•	Set infolevel[Statistics] to 2 or higher to see messages about the progress of the solvers. In particular, these userinfo messages indicate whether the LinearFit command or the NonlinearFit command is being used.

•	For fitting a data sample to a distribution, see MaximumLikelihoodEstimate.

Examples

>	$with (Statistics) &colon;$

>	$X ≔ Vector ([1, 2, 3, 4, 5, 6], datatype = float) &colon;$

>	$Y ≔ Vector ([2, 3, 4.8, 10.2, 15.6, 30.9], datatype = float) &colon;$

Fit a model that is linear in the parameters.

>	$Fit (a + b t + c t^{2}, X, Y, t)$

$6.62999999999999 - 5.37464285714286 t + 1.53392857142857 t^{2}$

(1)

It is also possible to generate a summary for models that are linear in the parameters. Note that the summary is not a part of the output that is assigned to ls:

>	$ls ≔ Fit (a + b t + c t^{2}, X, Y, t, summarize = embed) &colon;$

Model:

$6.6300000 - 5.3746429 t + 1.5339286 t^{2}$

Coefficients	Estimate	Standard Error	t-value	P(>\|t\|)
a	$6.63000$	$3.27597$	$2.02383$	$0.136152$
b	$−5.37464$	$2.14323$	$−2.50773$	$0.0871112$
c	$1.53393$	$0.299720$	$5.11787$	$0.0144383$

R-squared:

$0.983265$

Adjusted R-squared:

$0.972108$

Residuals

Residual Sum of Squares	Residual Mean Square	Residual Standard Error	Degrees of Freedom
$10.0612$	$3.35374$	$1.83132$	$3$

Five Point Summary

Minimum	First Quartile	Median	Third Quartile	Maximum
$−2.50500$	$−0.932262$	$0.507143$	$1.00964$	$1.29643$

$ls$

$6.62999999999999 - 5.37464285714286 t + 1.53392857142857 t^{2}$

(2)

Fit a model that is nonlinear in the parameters.

>	$Fit (a + b \exp (c t), X, Y, t)$

$0.887576142919275 + 0.606352318207693 {&ExponentialE;}^{0.649251558313311 t}$

(3)

Consider now an experiment where quantities $x$ , $y$ , and $z$ are quantities influencing a quantity $w$ according to an approximate relationship

$w = x^{a} + \frac{b x^{2}}{y} + c y z$

with unknown parameters $a$ , $b$ , and $c$ . Six data points are given by the following matrix, with respective columns for $x$ , $y$ , $z$ , and $w$ .

>	$ExperimentalData ≔ ⟨⟨1, 1, 1, 2, 2, 2⟩ \| ⟨1, 2, 3, 1, 2, 3⟩ \| ⟨1, 2, 3, 4, 5, 6⟩ \| ⟨0.531, 0.341, 0.163, 0.641, 0.713, - 0.040⟩⟩$

$ExperimentalData ≔ [\begin{array}{c} 1 & 1 & 1 & 0.531 \\ 1 & 2 & 2 & 0.341 \\ 1 & 3 & 3 & 0.163 \\ 2 & 1 & 4 & 0.641 \\ 2 & 2 & 5 & 0.713 \\ 2 & 3 & 6 & −0.040 \end{array}]$

(4)

We take an initial guess that the first term will be approximately quadratic in $x$ , that $b$ will be approximately $1$ , and for $c$ we do not even know whether it will be positive or negative, so we guess $c = 0$ . We compute both the model function and the residuals. Also, we select more verbose operation by setting $infolevel$ .

>	$infolevel [Statistics] ≔ 2 &colon;$

>	$NonlinearFit (x^{a} + \frac{b x^{2}}{y} + c y z, ExperimentalData, [x, y, z], initialvalues = [a = 2, b = 1, c = 0], output = [leastsquaresfunction, residuals])$

In NonlinearFit (algebraic form)

$[x^{1.14701973996968} - \frac{0.298041864889394 x^{2}}{y} - 0.0982511893429762 y z, [\begin{array}{c} 0.0727069457676300 & 0.116974310183398 & −0.146607992383251 & −0.0116127470057686 & −0.0770361532848388 & 0.0886489085642805 \end{array}]]$

(5)

We note that Maple selected the nonlinear fitting method. Furthermore, the exponent on $x$ is only about $1.14$ , and the other guesses were not very good either. However, this problem is conditioned well enough that Maple finds a good fit anyway.

Now suppose that the relationship that is used to model the data is altered as follows:

$w = a x + \frac{b x^{2}}{y} + c y z$

We adapt the calling sequence very slightly such that the expression is linear in the parameters. This also makes it possible to return a summary for the regression and more details on the residuals with the summarize option:

>	$Fit (a x + \frac{b x^{2}}{y} + c y z, ExperimentalData, [x, y, z], initialvalues = [a = 2, b = 1, c = 0], output = [leastsquaresfunction, residuals], summarize = embed)$

$[0.823072918385878 x - \frac{0.167910114211606 x^{2}}{y} - 0.0758022678386438 y z, [\begin{array}{c} −0.0483605363356285 & −0.0949087899254999 & 0.0781175302268541 & −0.0302963085707583 & 0.160697070037893 & −0.0978248634499976 \end{array}]]$

(6)

Model:

$0.82307292 x - \frac{0.16791011 x^{2}}{y} - 0.075802268 y z$

Coefficients	Estimate	Standard Error	t-value	P(>\|t\|)
a	$0.823073$	$0.189761$	$4.33742$	$0.0226122$
b	$−0.167910$	$0.0940047$	$−1.78619$	$0.172045$
c	$−0.0758023$	$0.0182477$	$−4.15408$	$0.0253587$

R-squared:

$0.960049$

Adjusted R-squared:

$0.920099$

Residuals

Residual Sum of Squares	Residual Mean Square	Residual Standard Error	Degrees of Freedom
$0.0537599$	$0.0179200$	$0.133865$	$3$

Five Point Summary

Minimum	First Quartile	Median	Third Quartile	Maximum
$−0.0978249$	$−0.0951518$	$−0.0393284$	$0.0849992$	$0.160697$

This time, Maple could select the linear fitting method, because the expression is linear in the parameters. The initial values for the parameters are not used.

Compatibility

•	The XY parameter was introduced in Maple 15.

•	For more information on Maple 15 changes, see Updates in Maple 15.

•	The Statistics[Fit] command was updated in Maple 2016.

•	The summarize option was introduced in Maple 2016.

•	For more information on Maple 2016 changes, see Updates in Maple 2016.

Maple

Maple Add-Ons

Math Success Platform

Improving Retention Rates

Maple Flow

MapleSim

Consulting Services

Maple T.A. and Möbius

Education

Industries

Automotive and Aerospace

Robotics

Machine Design & Industrial Automation

Other

Application Areas

Product Pricing

Purchasing

Institutional Student Licensing

Maplesoft Elite Maintenance (EMP)

Support

Product Training

Online Product Help

Webinars & Events

Publications

Content Hubs

Examples & Applications

Community

About Maplesoft

Media Center

User Community

Contact

Online Help

All Products Maple MapleSim

Maple

Powerful math software that is easy to use

Maple Add-Ons

Math Success Platform

Improving Retention Rates

Maple Flow

Engineering calculations & documentation

MapleSim

Advanced System Level Modeling

Consulting Services

Maple T.A. and Möbius

Education

Industries

Automotive and Aerospace

Robotics

Machine Design & Industrial Automation

Other

Application Areas

Product Pricing

Purchasing

Institutional Student Licensing

Maplesoft Elite Maintenance (EMP)

Support

Product Training

Online Product Help

Webinars & Events

Publications

Content Hubs

Examples & Applications

Community

About Maplesoft

Media Center

User Community

Contact

Online Help

All Products Maple MapleSim