This help page gives a brief overview of the ways in which problems can be specified for the regression commands in the Statistics package.

Simple Data Sets

Regression commands in the Statistics package that accept a one-dimensional set of data samples use a Vector internally to hold this data. Vectors specified as input are used as is. A one-dimensional Array or a flat list is accepted and automatically converted to a Vector. Similarly, a Matrix having just one row or column is also converted to a Vector, as are higher-dimensional Arrays where all but one of the dimensions have a range of a single element. Consequently, all output, including error messages, use Vectors.

Two-Dimensional Data Sets

A two-dimensional data set is most efficiently specified as a Matrix. Two-dimensional Arrays and nested lists are automatically converted to Matrices. One-dimensional Arrays, flat lists, and Vectors are converted to column Matrices. Higher-dimensional Arrays are treated similarly as for simple data sets.

When the data set contains values associated with a set of variables ${v}_{1}\,{v}_{2}\,\mathrm{...}\,{v}_{n}$, the ith column of the Matrix is understood to hold the values for variable ${v}_{i}$. Thus, the jth row represents the jth sample for ${v}_{1}\,{v}_{2}\,\mathrm{...}\,{v}_{n}$.

Data Sets for Independent and Dependent Variables

Data sets for independent and dependent variables can be specified separately or together. If data are specified separately, then one can use a Matrix to specify the values of the independent variables and a Vector for the values of the dependent variable; if a command allows for only one independent variable, such as ExponentialFit, then internally a Vector is used instead of the Matrix. As a convenience for users of the CurveFitting package, one can also specify the independent and dependent data in one Matrix. The number of columns of this Matrix is equal to the number n of independent variables plus one; the first n columns correspond to the n independent variables and the last column corresponds to the dependent variable. If there is a single independent variable (so the Matrix has two columns) then the same Matrix can be used for CurveFitting and Statistics[Regression] routines.

An exception applies for the advanced Matrix form of input for NonlinearFit; the same format is used, where both independent and dependent values must be placed together in a common Matrix, but the objective is somewhat different: this usage is recommended for advanced users primarily interested in highly efficient computation on large data sets. Accordingly, Arrays or other data formats are not accepted for this calling sequence.

Another exception is the OneWayANOVA command, where the input does not follow the model of dependent and independent data. This command accepts either a list of simple data sets or one two-dimensional data set.

Model Functions

Model functions applied to data sets can be specified in three ways: algebraic form, operator form and Matrix form.

A model function in algebraic form is given as an algebraic expression in the model parameters and the independent variables. When this form is used, the list of independent variables must be provided separately to the command.

A model function in operator form is given as a procedure with input parameters representing the independent variables and model parameters. Operator form is often used for functions that are difficult to express as a simple algebraic expression. This form is available for the LinearFit and NonlinearFit commands.

Matrix form also requires the model function to be specified as a procedure. However, the procedure works entirely with Vectors and Matrices and the interface is more complex. Matrix form results in the most efficient computation, because the data is provided in the form required by the internal solvers, thus eliminating the overhead of extra storage and copying. This form is available only for the NonlinearFit command.

## Was this information helpful?