|
NAG[g03acc] NAG[nag_mv_canon_var] - Canonical variate analysis
|
|
Calling Sequence
g03acc(weight, x, isx, ing, nig, cvm, e, ncv, cvx, tol, irankx, 'n'=n, 'm'=m, 'tdx'=tdx, 'nx'=nx, 'ng'=ng, 'wt'=wt, 'tdcvm'=tdcvm, 'tde'=tde, 'tdcvx'=tdcvx, 'fail'=fail)
nag_mv_canon_var(. . .)
Parameters
|
weight - String;
|
|
|
On entry: indicates the type of weights to be used in the analysis.
|
|
No weights are used.
|
|
The weights are treated as frequencies and the effective number of observations is the sum of the weights.
|
|
The weights are treated as being inversely proportional to the variance of the observations and the effective number of observations is the number of observations with non-zero weights.
|
|
Constraint: "Nag_NoWeights", "Nag_Weightsfreq" or "Nag_Weightsvar". .
|
|
|
x - Matrix(1..n, 1..tdx, datatype=float[8], order=C_order);
|
|
|
|
isx - Vector(1..m, datatype=integer[kernelopts('wordsize')/8]);
|
|
|
On entry: indicates whether or not the th variable is to be included in the analysis.
|
|
Constraint: for nx values of .
|
|
|
ing - Vector(1..n, datatype=integer[kernelopts('wordsize')/8]);
|
|
|
Constraint: , for . .
|
|
|
nig - Vector(1..ng, datatype=integer[kernelopts('wordsize')/8]);
|
|
|
|
cvm - Matrix(1..ng, 1..tdcvm, datatype=float[8], order=C_order);
|
|
|
|
e - Matrix(1.., 1..tde, datatype=float[8], order=C_order);
|
|
|
Note: the dimension, dim, of the array e must be at least .
|
|
|
ncv - assignable;
|
|
|
Note: On exit the variable ncv will have a value of type integer.
|
|
On exit: the number of canonical variates, . This will be the minimum of and the rank of x.
|
|
|
cvx - Matrix(1..nx, 1..tdcvx, datatype=float[8], order=C_order);
|
|
|
|
tol - float;
|
|
|
On entry: the value of tol is used to decide if the variables are of full rank and, if not, what is the rank of the variables. The smaller the value of tol the stricter the criterion for selecting the singular value decomposition. If a non-negative value of tol less than machine precision is entered, then the square root of machine precision is used instead.
|
|
Constraint: . .
|
|
|
irankx - assignable;
|
|
|
Note: On exit the variable irankx will have a value of type integer.
|
|
On exit: the rank of the dependent variables.
|
|
If the variables are of full rank then .
|
|
If the variables are not of full rank then irankx is an estimate of the rank of the dependent variables. irankx is calculated as the number of singular values greater than (largest singular value).
|
|
|
'n'=n - integer; (optional)
|
|
|
Default value: the first dimension of the arrays x, ing, wt.
|
|
On entry: the number of observations, .
|
|
Constraint: . .
|
|
|
'm'=m - integer; (optional)
|
|
|
Default value: the first dimension of the array isx and the second dimension of the array isxthe array x.
|
|
On entry: the total number of variables, .
|
|
Constraint: . .
|
|
|
'tdx'=tdx - integer; (optional)
|
|
|
On entry: the second dimension of the array x as declared in the function from which nag_mv_canon_var (g03acc) is called.
|
|
Constraint: . .
|
|
|
'nx'=nx - integer; (optional)
|
|
|
Default value: the first dimension of the array cvx and the second dimension of the array cvxthe array cvm.
|
|
On entry: the number of variables in the analysis, .
|
|
Constraint: . .
|
|
|
'ng'=ng - integer; (optional)
|
|
|
Default value: the first dimension of the arrays nig, cvm and the second dimension of the arrays nig, cvmthe array cvx.
|
|
On entry: the number of groups, .
|
|
Constraint: . .
|
|
|
'wt'=wt - Vector(1..n, datatype=float[8]); (optional)
|
|
|
On entry: if "Nag_Weightsfreq" or "Nag_Weightsvar" then the elements of wt must contain the weights to be used in the analysis.
|
|
If then the th observation is not included in the analysis.
|
|
, for ;
|
|
effective number of groups.
|
|
Note: if then wt is not referenced and may be set to the null pointer NULL, i.e., (double *)0.
|
|
|
'tdcvm'=tdcvm - integer; (optional)
|
|
|
On entry: the second dimension of the array cvm as declared in the function from which nag_mv_canon_var (g03acc) is called.
|
|
Constraint: . .
|
|
|
'tde'=tde - integer; (optional)
|
|
|
On entry: the second dimension of the array e as declared in the function from which nag_mv_canon_var (g03acc) is called.
|
|
Constraint: . .
|
|
|
'tdcvx'=tdcvx - integer; (optional)
|
|
|
On entry: the second dimension of the array cvx as declared in the function from which nag_mv_canon_var (g03acc) is called.
|
|
Constraint: . .
|
|
|
'fail'=fail - table; (optional)
|
|
|
The NAG error argument, see the documentation for NagError.
|
|
|
|
Description
|
|
|
Purpose
|
|
nag_mv_canon_var (g03acc) performs a canonical variate (canonical discrimination) analysis.
|
|
Description
|
|
Let a sample of observations on variables in a data matrix come from groups with observations in each group, . Canonical variate analysis finds the linear combination of the variables that maximizes the ratio of between-group to within-group variation. The variables formed, the canonical variates can then be used to discriminate between groups.
The canonical variates can be calculated from the eigenvectors of the within-group sums of squares and cross-products matrix. However, nag_mv_canon_var (g03acc) calculates the canonical variates by means of a singular value decomposition (SVD) of a matrix . Let the data matrix with variable (column) means subtracted be , and let its rank be ; then the by matrix is given by:
where is an by orthogonal matrix that defines the groups and is the first rows of the orthogonal matrix either from the decomposition of :
if is of full column rank, i.e., , else from the SVD of :
Let the SVD of be:
then the non-zero elements of the diagonal matrix , , for , are the canonical correlations associated with the canonical variates, where .
The eigenvalues, , of the within-group sums of squares matrix are given by:
and the value of gives the proportion of variation explained by the th canonical variate. The values of the 's give an indication as to how many canonical variates are needed to adequately describe the data, i.e., the dimensionality of the problem.
To test for a significant dimensionality greater than the statistic:
can be used. This is asymptotically distributed as a distribution with degrees of freedom. If the test for is not significant, then the remaining tests for should be ignored.
The loadings for the canonical variates are calculated from the matrix . This matrix is scaled so that the canonical variates have unit within group variance.
In addition to the canonical variates loadings the means for each canonical variate are calculated for each group.
Weights can be used with the analysis, in which case the weighted means are subtracted from each column and then each row is scaled by an amount , where is the weight for the th observation (row).
|
|
Error Indicators and Warnings
|
|
"NE_2_INT_ARG_LT"
On entry, while . These arguments must satisfy .
"NE_3_INT_ARG_CONS"
On entry, , and . These arguments must satisfy .
"NE_ALLOC_FAIL"
Dynamic memory allocation failed.
"NE_BAD_PARAM"
On entry, argument weight had an illegal value.
"NE_CANON_CORR_1"
A canonical correlation is equal to one. This will happen if the variables provide an exact indication as to which group every observation is allocated.
"NE_GROUPS"
Either the effective number of groups is less than two or the effective number of groups plus the number of variables, nx is greater than the the effective number of observations.
"NE_INT_ARG_LT"
On entry, nx must not be less than 1: .
"NE_INTARR_INT"
On entry, , . Constraint: , .
"NE_INTERNAL_ERROR"
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please consult NAG for assistance.
"NE_NEG_WEIGHT_ELEMENT"
On entry, . Constraint: When referenced, all elements of wt must be non-negative.
"NE_RANK_ZERO"
The rank of the variables is zero. This will happen if all the variables are constants.
"NE_REAL_ARG_LT"
On entry, tol must not be less than : .
"NE_SVD_NOT_CONV"
The singular value decomposition has failed to converge. This is an unlikely error exit.
"NE_VAR_INCL_INDICATED"
The number of variables, nx in the analysis , while number of variables included in the analysis via array . Constraint: these two numbers must be the same.
"NE_WT_ARGS"
The wt array argument must not be NULL when the weight argument indicates weights.
|
|
Accuracy
|
|
As the computation involves the use of orthogonal matrices and a singular value decomposition rather than the traditional computing of a sum of squares matrix and the use of an eigenvalue decomposition, nag_mv_canon_var (g03acc) should be less affected by ill conditioned problems.
|
|
|
Examples
|
|
>
|
weight := "Nag_NoWeights":
n := 9:
m := 3:
tdx := 3:
nx := 3:
ng := 3:
tdcvm := 3:
tde := 6:
tdcvx := 2:
tol := 1e-06:
x := Matrix([[13.3, 10.6, 21.2], [13.6, 10.2, 21], [14.2, 10.7, 21.1], [13.4, 9.4, 21], [13.2, 9.6, 20.1], [13.9, 10.4, 19.8], [12.9, 10, 20.5], [12.2, 9.9, 20.7], [13.9, 11, 19.1]], datatype=float[8], order='C_order'):
isx := Vector([1, 1, 1], datatype=integer[kernelopts('wordsize')/8]):
ing := Vector([1, 2, 3, 1, 2, 3, 1, 2, 3], datatype=integer[kernelopts('wordsize')/8]):
wt := Vector([0, 0, 0, 0, 0, 0, 0, 0, 0], datatype=float[8]):
nig := Vector(3, datatype=integer[kernelopts('wordsize')/8]):
cvm := Matrix(3, 3, datatype=float[8], order='C_order'):
e := Matrix(2, 6, datatype=float[8], order='C_order'):
cvx := Matrix(3, 2, datatype=float[8], order='C_order'):
NAG:-g03acc(weight, x, isx, ing, nig, cvm, e, ncv, cvx, tol, irankx, 'n' = n, 'm' = m, 'tdx' = tdx, 'nx' = nx, 'ng' = ng, 'wt' = wt, 'tdcvm' = tdcvm, 'tde' = tde, 'tdcvx' = tdcvx):
|
|
|
See Also
|
|
Chatfield C and Collins A J (1980) Introduction to Multivariate Analysis Chapman and Hall
Gnanadesikan R (1977) Methods for Statistical Data Analysis of Multivariate Observations Wiley
Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20 (3) 2–25
Kendall M G and Stuart A (1979) The Advanced Theory of Statistics (3 Volumes) (4th Edition) Griffin
g03 Chapter Introduction.
NAG Toolbox Overview.
NAG Web Site.
|
|