 Regression Commands - Maple Programming Help

Home : Support : Online Help : Statistics and Data Analysis : Statistics Package : Regression : Statistics/Regression

Regression Commands

 The Statistics package provides various commands for fitting linear and nonlinear models to data points and performing regression analysis.   The fitting algorithms are based on least-squares methods, which minimize the sum of the residuals squared.

Available Commands

 fit an exponential function to data fit a model function to data robust linear regression fit a linear model function to data fit a logarithmic function to data produce lowess smoothed functions fit a nonlinear model function to data generate a one-way ANOVA table fit a polynomial to data fit a power function to data fit a predictive linear model function to data robust linear regression

Linear Fitting

 • A number of commands are available for fitting a model function that is linear in the model parameters to given data.  For example, the model function $b{t}^{2}+at$ is linear in the parameters a and b, though it is nonlinear in the independent variable t.
 • The LinearFit command is available for multiple general linear regression.  For certain classes of model functions involving only one independent variable, the PolynomialFit, LogarithmicFit, PowerFit, and ExponentialFit commands are available. The PowerFit and ExponentialFit commands use a transformed model function that is linear in the parameters.

Nonlinear Fitting

 • The NonlinearFit command is available for nonlinear fitting.  An example model function is $ax+{ⅇ}^{by}$ where a and b are the parameters, and x and y are the independent variables.
 • This command relies on local nonlinear optimization solvers available in the Optimization package.  The LSSolve and NLPSolve commands in that package can also be used directly for least-squares and general nonlinear minimization.

Other Commands

 • The general Fit command allows you to provide either a linear or nonlinear model function.  It then determines the appropriate regression solver to use.
 • The OneWayANOVA command generates the standard ANOVA table for one-way classification, given two or more groups of observations.

Using the Regression Commands

 • Various options can be provided to the regression commands. For example, the weights option allows you to specify weights for the data points and the output option allows you to control the format of the results.  The options available for each command are described briefly in the command's help page and in greater detail in the Statistics/Regression/Options help page.
 • The format of the solutions returned by the regression commands is described in the Statistics/Regression/Solution help page.
 • Most of the regression commands use methods implemented in a built-in library provided by the Numerical Algorithms Group (NAG).  The underlying computation is done in floating-point.  Either hardware or software (arbitrary precision) floating-point computation can be specified.
 • The model function and data sets may be provided in different ways.  Full details are available in the Statistics/Regression/InputForms help page. The regression routines work primarily with Vectors and Matrices.  In most cases, lists (both flat and nested) and Arrays are also accepted and automatically converted to Vectors or Matrices.  Consequently, all output, including error messages, uses these data types.

Examples

 > $\mathrm{with}\left(\mathrm{Statistics}\right):$

Define Vectors X and Y, containing values of an independent variable x and a dependent variable y.

 > $X≔\mathrm{Vector}\left(\left[1.2,2.1,3.1,4.0,5.7,6.6,7.2,7.9,9.1,10.3\right]\right):$
 > $Y≔\mathrm{Vector}\left(\left[4.6,7.7,11.5,15.4,22.2,33.1,48.1,70.6,109.0,168.4\right]\right):$

Find the values of a and b that minimize the least-squares error when the model function $at+b{ⅇ}^{x}$ is used.

 > $\mathrm{Fit}\left(ax+b\mathrm{exp}\left(x\right),X,Y,x\right)$
 ${6.02861839709607}{}{x}{+}{0.00380375570567371}{}{{ⅇ}}^{{x}}$ (1)

It is also possible to return a summary of the regression model using the summarize option:

 > $\mathrm{Fit}\left(ax+b\mathrm{exp}\left(x\right),X,Y,x,\mathrm{summarize}=\mathrm{embed}\right)$
 ${6.02861839709607}{}{x}{+}{0.00380375570567371}{}{{ⅇ}}^{{x}}$ (2)

Model:

${6.0286184}{}{x}{+}{0.0038037557}{}{{ⅇ}}^{{x}}$

 Coefficients Estimate Standard Error t-value P(>|t|) a ${6.02862}$ ${0.761415}$ ${7.91765}$ ${\mathbf{0.0000470413}}$ b ${0.00380376}$ ${0.000494423}$ ${7.69332}$ ${\mathbf{0.0000577943}}$

R-squared:

${0.978977}$

${0.973721}$

Residuals

 Residual Sum of Squares Residual Mean Square Residual Standard Error Degrees of Freedom ${1042.23}$ ${130.279}$ ${11.4140}$ ${8}$

Five Point Summary

 Minimum First Quartile Median Third Quartile Maximum ${-13.2999}$ ${-8.96906}$ ${-5.89077}$ ${0.691999}$ ${20.0758}$

Fit a polynomial of degree 3 through this data.

 > $\mathrm{PolynomialFit}\left(3,X,Y,x\right)$
 ${-}{3.37372868459017}{+}{9.90059487215674}{}{x}{-}{2.79612412098216}{}{{x}}^{{2}}{+}{0.336249676048196}{}{{x}}^{{3}}$ (3)

Use the output option to see the residual sum of squares and the standard errors.

 > $\mathrm{PolynomialFit}\left(3,X,Y,x,\mathrm{output}=\left[\mathrm{residualsumofsquares},\mathrm{standarderrors}\right]\right)$
 $\left[{47.847131867356545}{,}\left[\begin{array}{cccc}6.325965104747092& 4.473060232720555& 0.8617838332836651& 0.04873550154386413\end{array}\right]\right]$ (4)

Fit the model function $ax+{ⅇ}^{bx}$, which is nonlinear in the parameters.

 > $\mathrm{NonlinearFit}\left(ax+\mathrm{exp}\left(bx\right),X,Y,x\right)$
 ${2.12883148575966}{}{x}{+}{{ⅇ}}^{{0.486510105685615}{}{x}}$ (5)

Consider now an experiment where quantities $x$, $y$, and $z$ are quantities influencing a quantity $w$ according to an approximate relationship

$w={x}^{a}+\frac{b{x}^{2}}{y}+cyz$

with unknown parameters $a$, $b$, and $c$. Six data points are given by the following matrix, with respective columns for $x$, $y$, $z$, and $w$.

 > $\mathrm{ExperimentalData}≔⟨⟨1,1,1,2,2,2⟩|⟨1,2,3,1,2,3⟩|⟨1,2,3,4,5,6⟩|⟨0.531,0.341,0.163,0.641,0.713,-0.040⟩⟩$
 $\left[\begin{array}{cccc}1& 1& 1& 0.531\\ 1& 2& 2& 0.341\\ 1& 3& 3& 0.163\\ 2& 1& 4& 0.641\\ 2& 2& 5& 0.713\\ 2& 3& 6& -0.040\end{array}\right]$ (6)

We take an initial guess that the first term will be approximately quadratic in $x$, that $b$ will be approximately $1$, and for $c$ we don't even know whether it's going to be positive or negative, so we guess $c=0$. We compute both the model function and the residuals. Also, we select more verbose operation by setting $\mathrm{infolevel}$.

 > $\mathrm{infolevel}\left[\mathrm{Statistics}\right]≔2:$
 > $\mathrm{NonlinearFit}\left({x}^{a}+\frac{b{x}^{2}}{y}+cyz,\mathrm{ExperimentalData},\left[x,y,z\right],\mathrm{initialvalues}=\left[a=2,b=1,c=0\right],\mathrm{output}=\left[\mathrm{leastsquaresfunction},\mathrm{residuals}\right]\right)$
 In NonlinearFit (algebraic form)
 $\left[{{x}}^{{1.1470197399696782}}{-}\frac{{0.29804186488939366}{}{{x}}^{{2}}}{{y}}{-}{0.09825118934297625}{}{y}{}{z}{,}\left[\begin{array}{cccccc}0.07270694576763004& 0.11697431018339816& -0.1466079923832507& -0.011612747005768642& -0.07703615328483882& 0.08864890856428051\end{array}\right]\right]$ (7)

We note that Maple selected the nonlinear fitting method. Furthermore, the exponent on $x$ is only about $1.14$, and the other guesses were not very good either. However, this problem is conditioned well enough that Maple finds a good fit anyway.

Now suppose that the relationship that is used to model the data is altered as follows:

$w=ax+\frac{b{x}^{2}}{y}+cyz$

We adapt the calling sequence very slightly:

 > $\mathrm{Fit}\left(ax+\frac{b{x}^{2}}{y}+cyz,\mathrm{ExperimentalData},\left[x,y,z\right],\mathrm{initialvalues}=\left[a=2,b=1,c=0\right],\mathrm{output}=\left[\mathrm{leastsquaresfunction},\mathrm{residuals}\right]\right)$
 In Fit
 In LinearFit (container form)
 final value of residual sum of squares: .0537598869493245
 Summary: ---------------- Model: .82307292*x-.16791011*x^2/y-.75802268e-1*y*z ---------------- Coefficients:     Estimate  Std. Error  t-value  P(>|t|) a    0.8231    0.1898      4.3374   0.0226 b   -0.1679    0.0940     -1.7862   0.1720 c   -0.0758    0.0182     -4.1541   0.0254 ---------------- R-squared: 0.9600, Adjusted R-squared: 0.9201
 $\left[{0.8230729183858783}{}{x}{-}\frac{{0.16791011421160582}{}{{x}}^{{2}}}{{y}}{-}{0.07580226783864379}{}{y}{}{z}{,}\left[\begin{array}{cccccc}-0.04836053633562854& -0.09490878992549993& 0.07811753022685414& -0.03029630857075828& 0.16069707003789296& -0.09782486344999755\end{array}\right]\right]$ (8)
 > $\mathrm{infolevel}\left[\mathrm{Statistics}\right]≔0:$

This time, Maple could select the linear fitting method, because the expression is linear in the parameters. In addition, as the infolevel is greater than 0 and the expression is linear in the parameters, a summary for the regression is displayed. The initial values for the parameters are not used.

Finally, consider a situation where an ordinary differential equation leads to results that need to be fitted. The system is given by

$\left[x\left(0\right)=-a,\frac{ⅆ}{ⅆt}\phantom{\rule[-0.0ex]{0.4em}{0.0ex}}x\left(t\right)=z{x\left(t\right)}^{-b}+1\right]$

where $a$ and $b$ are parameters that we want to find, $z$ is a variable that we can vary between experiments, and $x\left(t\right)$ is a quantity that we can measure at $t=1$. We perform 10 experiments at $z=0.1,0.2,\mathrm{...},1.0$, and the results are as follows.

 > $\mathrm{Input}≔\left[\mathrm{seq}\left(0.1..1,0.1\right)\right]$
 ${\mathrm{Input}}{≔}\left[{0.1}{,}{0.2}{,}{0.3}{,}{0.4}{,}{0.5}{,}{0.6}{,}{0.7}{,}{0.8}{,}{0.9}{,}{1.0}\right]$ (9)
 > $\mathrm{Output}≔\left[1.932,2.092,2.090,2.416,2.544,2.638,2.894,3.188,3.533,3.822\right]$
 ${\mathrm{Output}}{≔}\left[{1.932}{,}{2.092}{,}{2.090}{,}{2.416}{,}{2.544}{,}{2.638}{,}{2.894}{,}{3.188}{,}{3.533}{,}{3.822}\right]$ (10)

We now need to set up a procedure that NonlinearFit can call to obtain the value for a given input value $z$ and a given pair of parameters $a$ and $b$. We do this using dsolve/numeric.

 > $\mathrm{ODE}≔\left[x\left(0\right)=-a,\mathrm{diff}\left(x\left(t\right),t\right)=z{x\left(t\right)}^{-b}+1\right]$
 ${\mathrm{ODE}}{≔}\left[{x}{}\left({0}\right){=}{-}{a}{,}\frac{{ⅆ}}{{ⅆ}{t}}\phantom{\rule[-0.0ex]{0.4em}{0.0ex}}{x}{}\left({t}\right){=}{z}{}{{x}{}\left({t}\right)}^{{-}{b}}{+}{1}\right]$ (11)
 > $\mathrm{ODE_Solution}≔\mathrm{dsolve}\left(\mathrm{ODE},\mathrm{numeric},\mathrm{parameters}=\left[a,b,z\right]\right)$
 ${\mathrm{ODE_Solution}}{≔}{\mathbf{proc}}\left({\mathrm{x_rkf45}}\right)\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{...}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{end proc}}$ (12)

We now have a procedure ODE_Solution that can compute the correct value, but we need to write a wrapper that has the form that NonlinearFit expects. We first need to call ODE_Solution once to set the parameters, then another time to obtain the value of $x\left(t\right)$ at $t=1$, and then return this value (for more information about how this works, see dsolve/numeric). By hand, we can do this as follows:

 > $\mathrm{ODE_Solution}\left(\mathrm{parameters}=\left[a=-1,b=-0.5,z=1\right]\right)$
 $\left[{a}{=}{-1.}{,}{b}{=}{-0.5}{,}{z}{=}{1.}\right]$ (13)
 > $\mathrm{ODE_Solution}\left(1\right)$
 $\left[{t}{=}{1.}{,}{x}{}\left({t}\right){=}{3.44630585135012}\right]$ (14)
 > $\mathrm{ODE_Solution}\left(\mathrm{parameters}=\left[a=1,b=1,z=1\right]\right)$
 $\left[{a}{=}{1.}{,}{b}{=}{1.}{,}{z}{=}{1.}\right]$ (15)
 > $\mathrm{ODE_Solution}\left(1\right)$

Note that for some settings of the parameters, we cannot obtain a solution. We need to take care of this in the procedure we create (which we call f), by returning a value that is very far from all output points, leading to a very bad fit for these erroneous parameter values.

 > f := proc(zValue, aValue, bValue) global ODE_Solution, a, b, z, x, t; ODE_Solution('parameters' = [a = aValue, b = bValue, z = zValue]); try return eval(x(t), ODE_Solution(1)); catch: return 100; end try; end proc;
 ${f}{≔}{\mathbf{proc}}\left({\mathrm{zValue}}{,}{\mathrm{aValue}}{,}{\mathrm{bValue}}\right)\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{global}}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathrm{ODE_Solution}}{,}{a}{,}{b}{,}{z}{,}{x}{,}{t}{;}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathrm{ODE_Solution}}{}\left({'}{\mathrm{parameters}}{'}{=}\left[{a}{=}{\mathrm{aValue}}{,}{b}{=}{\mathrm{bValue}}{,}{z}{=}{\mathrm{zValue}}\right]\right){;}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{try}}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{return}}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathrm{eval}}{}\left({x}{}\left({t}\right){,}{\mathrm{ODE_Solution}}{}\left({1}\right)\right)\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{catch}}{:}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{return}}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{100}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{end try}}\phantom{\rule[-0.0ex]{0.5em}{0.0ex}}{\mathbf{end proc}}$ (16)
 > $f\left(1,-1,-0.5\right)$
 ${3.44630585135012}$ (17)

We need to provide an initial estimate for the parameter values, because the fitting procedure is only performed in a local sense. We go with the values that provided a solution above: $a=-1,b=-0.5$.

 > $\mathrm{NonlinearFit}\left(f,\mathrm{Input},\mathrm{Output},\mathrm{output}=\mathrm{parametervector},\mathrm{initialvalues}=\left[-1,-0.5\right]\right)$
 $\left[\begin{array}{c}-0.7394069215465607\\ -1.0308199274976924\end{array}\right]$ (18)
 > 

References

 Draper, Norman R., and Smith, Harry. Applied Regression Analysis. 3rd ed. New York: Wiley, 1998.