|
NAG[g02dac] NAG[nag_regsn_mult_linear] - Fits a general (multiple) linear regression model
|
|
Calling Sequence
g02dac(mean, x, sx, y, rss, df, b, se, cov, res, h, q, svd, rank, p, com_ar, 'n'=n, 'tdx'=tdx, 'm'=m, 'ip'=ip, 'wt'=wt, 'tdq'=tdq, 'tol'=tol, 'fail'=fail)
nag_regsn_mult_linear(. . .)
Parameters
|
mean - String;
|
|
|
On entry: indicates if a mean term is to be included.
|
|
A mean term, (intercept), will be included in the model.
|
|
The model will pass through the origin, zero point.
|
|
Constraint: "Nag_MeanInclude" or "Nag_MeanZero". .
|
|
|
x - Matrix(1..n, 1..tdx, datatype=float[8], order=C_order);
|
|
|
|
sx - Vector(1..m, datatype=integer[kernelopts('wordsize')/8]);
|
|
|
On entry: indicates which of the potential independent variables are to be included in the model. If , then the variable contained in the corresponding column of x is included in the regression model.
|
|
, for ;
|
|
if , then exactly ip values of sx must be .
|
|
|
y - Vector(1..n, datatype=float[8]);
|
|
|
On entry: observations on the dependent variable, .
|
|
|
rss - assignable;
|
|
|
Note: On exit the variable rss will have a value of type float.
|
|
On exit: the residual sum of squares for the regression.
|
|
|
df - assignable;
|
|
|
Note: On exit the variable df will have a value of type float.
|
|
On exit: the degrees of freedom associated with the residual sum of squares.
|
|
|
b - Vector(1..ip, datatype=float[8]);
|
|
|
|
se - Vector(1..ip, datatype=float[8]);
|
|
|
On exit: , for , contains the standard errors of the ip argument estimates given in b.
|
|
|
cov - Vector(1.., datatype=float[8]);
|
|
|
Note: the dimension, dim, of the array cov must be at least .
|
|
|
res - Vector(1..n, datatype=float[8]);
|
|
|
On exit: the (weighted) residuals, .
|
|
|
h - Vector(1..n, datatype=float[8]);
|
|
|
On exit: the diagonal elements of , , the leverages.
|
|
|
q - Matrix(1..n, 1..tdq, datatype=float[8], order=C_order);
|
|
|
|
svd - assignable;
|
|
|
Note: On exit the variable svd will have a value of type boolean.
|
|
On exit: if a singular value decomposition has been performed then svd will be true, otherwise svd will be false.
|
|
|
rank - assignable;
|
|
|
Note: On exit the variable rank will have a value of type integer.
|
|
On exit: the rank of the independent variables.
|
|
If , .
|
|
If , rank is an estimate of the rank of the independent variables. rank is calculated as the number of singular values greater than tol (largest singular value). It is possible for the SVD to be carried out but rank to be returned as ip.
|
|
|
p - Vector(1.., datatype=float[8]);
|
|
|
Note: the dimension, dim, of the array p must be at least .
|
|
On exit: details of the decomposition and SVD if used.
|
|
If , only the first ip elements of p are used, these will contain the zeta values for the decomposition (see f01qcc (nag_real_qr) for details).
|
|
|
com_ar - Vector(1.., datatype=float[8]);
|
|
|
Note: the dimension, dim, of the array com_ar must be at least .
|
|
|
'n'=n - integer; (optional)
|
|
|
Default value: the first dimension of the arrays x, y, wt, res, h, q.
|
|
On entry: the number of observations, .
|
|
Constraint: . .
|
|
|
'tdx'=tdx - integer; (optional)
|
|
|
On entry: the second dimension of the array x as declared in the function from which nag_regsn_mult_linear (g02dac) is called.
|
|
Constraint: . .
|
|
|
'm'=m - integer; (optional)
|
|
|
Default value: the first dimension of the array sx and the second dimension of the array sxthe array x.
|
|
On entry: the total number of independent variables in the data set, .
|
|
Constraint: . .
|
|
|
'ip'=ip - integer; (optional)
|
|
|
Default value: the first dimension of the arrays b, se and the second dimension of the arrays b, sethe array q.
|
|
On entry: the number of independent variables in the model, including the mean or intercept if present.
|
|
if , ;
|
|
if , .
|
|
|
'wt'=wt - Vector(1..n, datatype=float[8]); (optional)
|
|
|
On entry: if weighted estimates are required then wt must contain the weights to be used in the weighted regression. Otherwise wt need not be defined and may be set to the null pointer NULL, i.e., (double *)0.
|
|
If , then the th observation is not included in the model, in which case the effective number of observations is the number of observations with positive weights. The values of res and h will be set to zero for observations with zero weights.
|
|
If , then the effective number of observations is .
|
|
Constraint: , for . .
|
|
|
'tdq'=tdq - integer; (optional)
|
|
|
On entry: the second dimension of the array q as declared in the function from which nag_regsn_mult_linear (g02dac) is called.
|
|
Constraint: . .
|
|
|
'tol'=tol - float; (optional)
|
|
|
On entry: the value of tol is used to decide what is the rank of the independent variables. The smaller the value of tol the stricter the criterion for selecting the singular value decomposition. If , then the singular value decomposition will never be used, this may cause run time errors or inaccurate results if the independent variables are not of full rank.
|
|
Suggested value: (default: ) .
|
|
Constraint: . .
|
|
|
'fail'=fail - table; (optional)
|
|
|
The NAG error argument, see the documentation for NagError.
|
|
|
|
Description
|
|
|
Purpose
|
|
nag_regsn_mult_linear (g02dac) performs a general multiple linear regression when the independent variables may be linearly dependent. Argument estimates, standard errors, residuals and influence statistics are computed. nag_regsn_mult_linear (g02dac) may be used to perform a weighted regression.
|
|
Description
|
|
The general linear regression model is defined by
where
|
is a vector of observations on the dependent variable,
|
|
is a vector of length of unknown arguments, and
|
|
Note: the independent variables may be selected by the user from a set of potential independent variables.
|
If , the identity matrix, then least-squares estimation is used.
If , then for a given weight matrix , weighted least-squares estimation is used.
The least-squares estimates of the arguments minimize while the weighted least-squares estimates minimize .
nag_regsn_mult_linear (g02dac) finds a decomposition of (or in the weighted case), i.e.,
where and is a by upper triangular matrix and is an by orthogonal matrix.
If is of full rank, then is the solution to
where (or ) and is the first elements of .
If is not of full rank a solution is obtained by means of a singular value decomposition (SVD) of ,
where is a by diagonal matrix with non-zero diagonal elements, being the rank of and and are by orthogonal matrices. This gives the solution
being the first columns of , i.e., and being the first columns of .
Details of the SVD are made available, in the form of the matrix :
This will be only one of the possible solutions. Other estimates may be obtained by applying constraints to the arguments. These solutions can be obtained by using g02dkc (nag_regsn_mult_linear_tran_model) after using nag_regsn_mult_linear (g02dac). Only certain linear combinations of the arguments will have unique estimates; these are known as estimable functions.
The fit of the model can be examined by considering the residuals, , where are the fitted values. The fitted values can be written as for an by matrix . The th diagonal element of , , gives a measure of the influence of the th value of the independent variables on the fitted regression model. The values are sometimes known as leverages. Both and are provided by nag_regsn_mult_linear (g02dac).
The output of nag_regsn_mult_linear (g02dac) also includes , the residual sum of squares and associated degrees of freedom, , the standard errors of the argument estimates and the variance-covariance matrix of the argument estimates.
In many linear regression models the first term is taken as a mean term or an intercept, i.e., , for . This is provided as an option. Also note that not all the potential independent variables need to be included in a model; a facility to select variables to be included in the model is provided.
Details of the decomposition and, if used, the SVD, are made available. These allow the regression to be updated by adding or deleting an observation using g02dcc (nag_regsn_mult_linear_addrem_obs), adding or deleting a variable using g02dec (nag_regsn_mult_linear_add_var) and g02dfc (nag_regsn_mult_linear_delete_var) or estimating and testing an estimable function using g02dnc (nag_regsn_mult_linear_est_func).
|
|
Error Indicators and Warnings
|
|
"NE_ALLOC_FAIL"
Dynamic memory allocation failed.
"NE_BAD_PARAM"
On entry, argument mean had an illegal value.
"NE_BAD_SX_OR_IP"
Either a value of sx is , or ip is incompatible with mean and sx, or the effective number of observations.
"NE_INT_ARG_LT"
On entry, n must not be less than 2: .
"NE_REAL_ARG_LT"
On entry, tol must not be less than 0.0: .
"NE_SVD_NOT_CONV"
The singular value decomposition has failed to converge.
"NE_ZERO_DOF_RESID"
The degrees of freedom for the residuals are zero, i.e., the designated number of arguments the effective number of observations. In this case the argument estimates will be returned along with the diagonal elements of , but neither standard errors nor the variance-covariance matrix will be calculated.
|
|
Accuracy
|
|
The accuracy of this function is closely related to the accuracy of f01qcc (nag_real_qr). That function document should be consulted.
|
|
Further Comments
|
|
Function g02fac (nag_regsn_std_resid_influence) can be used to compute standardized residuals and further measures of influence. nag_regsn_mult_linear (g02dac) requires, in particular, the results stored in res and h.
|
|
|
Examples
|
|
>
|
mean := "Nag_MeanInclude":
n := 12:
tdx := 4:
m := 4:
ip := 5:
tdq := 6:
tol := 1e-05:
x := Matrix([[1, 0, 0, 0], [0, 0, 0, 1], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1], [0, 1, 0, 0], [0, 0, 0, 1], [1, 0, 0, 0], [0, 0, 1, 0], [1, 0, 0, 0], [0, 0, 1, 0], [0, 1, 0, 0]], datatype=float[8], order='C_order'):
sx := Vector([1, 1, 1, 1], datatype=integer[kernelopts('wordsize')/8]):
y := Vector([33.63, 39.62, 38.18, 41.46, 38.02, 35.83, 35.99, 36.58, 42.92, 37.8, 40.43, 37.89], datatype=float[8]):
wt := Vector([], datatype=float[8]):
b := Vector(5, datatype=float[8]):
se := Vector(5, datatype=float[8]):
cov := Vector(15, datatype=float[8]):
res := Vector(12, datatype=float[8]):
h := Vector(12, datatype=float[8]):
q := Matrix(12, 6, datatype=float[8], order='C_order'):
p := Vector(35, datatype=float[8]):
com_ar := Vector(100, datatype=float[8]):
NAG:-g02dac(mean, x, sx, y, rss, df, b, se, cov, res, h, q, svd, rank, p, com_ar, 'n' = n, 'tdx' = tdx, 'm' = m, 'ip' = ip, 'wt' = wt, 'tdq' = tdq, 'tol' = tol):
|
|
|
See Also
|
|
Cook R D and Weisberg S (1982) Residuals and Influence in Regression Chapman and Hall
Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley
Golub G H and Van Loan C F (1996) Matrix Computations (3rd Edition) Johns Hopkins University Press, Baltimore
Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20 (3) 2–25
McCullagh P and Nelder J A (1983) Generalized Linear Models Chapman and Hall
Searle S R (1971) Linear Models Wiley
g02 Chapter Introduction.
NAG Toolbox Overview.
NAG Web Site.
|
|