|
NAG[g03aac] NAG[nag_mv_prin_comp] - Principal component analysis
|
|
Calling Sequence
g03aac(pcmatrix, scores, x, isx, s, e, p, v, 'n'=n, 'm'=m, 'tdx'=tdx, 'wt'=wt, 'nvar'=nvar, 'tde'=tde, 'tdp'=tdp, 'tdv'=tdv, 'fail'=fail)
nag_mv_prin_comp(. . .)
Parameters
|
pcmatrix - String;
|
|
|
On entry: indicates for which type of matrix the principal component analysis is to be carried out.
|
|
It is for the correlation matrix.
|
|
It is for the standardized matrix, with standardizations given by s.
|
|
It is for the sums of squares and cross-products matrix.
|
|
It is for the variance-covariance matrix.
|
|
Constraint: "Nag_MatCorrelation", "Nag_MatStandardised", "Nag_MatSumSq" or "Nag_MatVarCovar". .
|
|
|
scores - String;
|
|
|
On entry: specifies the type of principal component scores to be used.
|
|
The principal component scores are unstandardized, i.e., .
|
|
The principal component scores are standardized so that they have unit variance.
|
|
The principal component scores are standardized so that they have variance equal to the corresponding eigenvalue.
|
|
Constraint: "Nag_ScoresStand", "Nag_ScoresNotStand", "Nag_ScoresUnitVar" or "Nag_ScoresEigenval". .
|
|
|
x - Matrix(1..n, 1..tdx, datatype=float[8], order=C_order);
|
|
|
|
isx - Vector(1..m, datatype=integer[kernelopts('wordsize')/8]);
|
|
|
Constraint: for nvar values of .
|
|
|
s - Vector(1..m, datatype=float[8]);
|
|
|
On entry: the standardizations to be used, if any.
|
|
On exit: if , then s is unchanged on exit.
|
|
If "Nag_MatSumSq" or "Nag_MatVarCovar", then s is not referenced.
|
|
|
e - Matrix(1..nvar, 1..tde, datatype=float[8], order=C_order);
|
|
|
If , then is returned as zero.
|
|
|
p - Matrix(1..nvar, 1..tdp, datatype=float[8], order=C_order);
|
|
|
|
v - Matrix(1..n, 1..tdv, datatype=float[8], order=C_order);
|
|
|
On exit: the first nvar columns of v contain the principal component scores. The th column of v contains the n scores for the th principal component.
|
|
If weights are supplied in the array wt, then any rows for which is zero will be set to zero.
|
|
|
'n'=n - integer; (optional)
|
|
|
Default value: the first dimension of the arrays x, wt, v.
|
|
On entry: the number of observations, .
|
|
Constraint: . .
|
|
|
'm'=m - integer; (optional)
|
|
|
Default value: the first dimension of the arrays isx, s and the second dimension of the arrays isx, sthe array x.
|
|
On entry: the number of variables in the data matrix, .
|
|
Constraint: . .
|
|
|
'tdx'=tdx - integer; (optional)
|
|
|
On entry: the second dimension of the array x as declared in the function from which nag_mv_prin_comp (g03aac) is called.
|
|
Constraint: . .
|
|
|
'wt'=wt - Vector(1..n, datatype=float[8]); (optional)
|
|
|
On entry: the elements of wt must contain the weights to be used in the principal component analysis. The effective number of observations is the sum of the weights.
|
|
, for ;
|
|
the sum of weights .
|
|
If then the th observation is not included in the analysis.
|
|
Note: If wt is set to the null pointer NULL, i.e., (double *)0, then wt is not referenced and the effective number of observations is .
|
|
|
'nvar'=nvar - integer; (optional)
|
|
|
Default value: the first dimension of the arrays e, p and the second dimension of the arrays e, pthe arrays p, v.
|
|
On entry: the number of variables in the principal component analysis, .
|
|
Constraint: . .
|
|
|
'tde'=tde - integer; (optional)
|
|
|
On entry: the second dimension of the array e as declared in the function from which nag_mv_prin_comp (g03aac) is called.
|
|
Constraint: . .
|
|
|
'tdp'=tdp - integer; (optional)
|
|
|
On entry: the second dimension of the array p as declared in the function from which nag_mv_prin_comp (g03aac) is called.
|
|
Constraint: . .
|
|
|
'tdv'=tdv - integer; (optional)
|
|
|
On entry: the second dimension of the array v as declared in the function from which nag_mv_prin_comp (g03aac) is called.
|
|
Constraint: . .
|
|
|
'fail'=fail - table; (optional)
|
|
|
The NAG error argument, see the documentation for NagError.
|
|
|
|
Description
|
|
|
Purpose
|
|
nag_mv_prin_comp (g03aac) performs a principal component analysis on a data matrix; both the principal component loadings and the principal component scores are returned.
|
|
Description
|
|
Let be an by data matrix of observations on variables and let the by variance-covariance matrix of be . A vector of length is found such that:
is maximized subject to
The variable is known as the first principal component and gives the linear combination of the variables that gives the maximum variation. A second principal component, , is found such that:
is maximized subject to
and
This gives the linear combination of variables that is orthogonal to the first principal component that gives the maximum variation. Further principal components are derived in a similar way.
The vectors , are the eigenvectors of the matrix and associated with each eigenvector is the eigenvalue, . The value of gives the proportion of variation explained by the th principal component. Alternatively, the 's can be considered as the right singular vectors in a singular value decomposition with singular values of the data matrix centred about its mean and scaled by , . This latter approach is used in nag_mv_prin_comp (g03aac), with
where is a diagonal matrix with elements , is the by matrix with columns and is an by matrix with , which gives the principal component scores.
Principal component analysis is often used to reduce the dimension of a data set, replacing a large number of correlated variables with a smaller number of orthogonal variables that still contain most of the information in the original data set.
The choice of the number of dimensions required is usually based on the amount of variation accounted for by the leading principal components. If principal components are selected, then a test of the equality of the remaining eigenvalues is
which has, asymptotically, a distribution with degrees of freedom.
Equality of the remaining eigenvalues indicates that if any more principal components are to be considered then they all should be considered.
Instead of the variance-covariance matrix the correlation matrix, the sums of squares and cross-products matrix or a standardized sums of squares and cross-products matrix may be used. In the last case is replaced by for a diagonal matrix with positive elements. If the correlation matrix is used, the approximation for the statistic given above is not valid.
The principal component scores, , are the values of the principal component variables for the observations. These can be standardized so that the variance of these scores for each principal component is 1.0 or equal to the corresponding eigenvalue.
Weights can be used with the analysis, in which case the matrix is first centred about the weighted means then each row is scaled by an amount , where is the weight for the th observation.
|
|
Error Indicators and Warnings
|
|
"NE_2_INT_ARG_GE"
On entry, while . These arguments must satisfy .
"NE_2_INT_ARG_GT"
On entry, while . These arguments must satisfy .
"NE_ALLOC_FAIL"
Dynamic memory allocation failed.
"NE_BAD_PARAM"
On entry, argument pcmatrix had an illegal value.
"NE_INT_ARG_LT"
On entry, m must not be less than 1: .
"NE_INTERNAL_ERROR"
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please consult NAG for assistance.
"NE_NEG_WEIGHT_ELEMENT"
On entry, . Constraint: when referenced, all elements of wt must be non-negative.
"NE_OBSERV_LT_VAR"
With weighted data, the effective number of observations given by the sum of weights , while the number of variables included in the analysis, . Constraint: effective number of observations .
"NE_SVD_NOT_CONV"
The singular value decomposition has failed to converge. This is an unlikely error exit.
"NE_VAR_INCL_INDICATED"
The number of variables, nvar in the analysis , while the number of variables included in the analysis via array . Constraint: these two numbers must be the same.
"NE_VAR_INCL_STANDARD"
On entry, the standardization element , while the variable to be included . Constraint: when a variable is to included, the standardization element must be positive.
"NE_ZERO_EIGVALS"
All eigenvalues/singular values are zero. This will be caused by all the variables being constant.
|
|
Accuracy
|
|
As nag_mv_prin_comp (g03aac) uses a singular value decomposition of the data matrix, it will be less affected by ill-conditioned problems than traditional methods using the eigenvalue decomposition of the variance-covariance matrix.
|
|
|
Examples
|
|
>
|
pcmatrix := "Nag_MatVarCovar":
scores := "Nag_ScoresEigenval":
n := 10:
m := 3:
tdx := 3:
nvar := 3:
tde := 6:
tdp := 3:
tdv := 3:
x := Matrix([[7, 4, 3], [4, 1, 8], [6, 3, 5], [8, 6, 1], [8, 5, 7], [7, 2, 9], [5, 3, 3], [9, 5, 8], [7, 4, 5], [8, 2, 2]], datatype=float[8], order='C_order'):
isx := Vector([1, 1, 1], datatype=integer[kernelopts('wordsize')/8]):
s := Vector([0, 0, 0], datatype=float[8]):
wt := Vector([], datatype=float[8]):
e := Matrix(3, 6, datatype=float[8], order='C_order'):
p := Matrix(3, 3, datatype=float[8], order='C_order'):
v := Matrix(10, 3, datatype=float[8], order='C_order'):
NAG:-g03aac(pcmatrix, scores, x, isx, s, e, p, v, 'n' = n, 'm' = m, 'tdx' = tdx, 'wt' = wt, 'nvar' = nvar, 'tde' = tde, 'tdp' = tdp, 'tdv' = tdv):
|
|
|
See Also
|
|
Chatfield C and Collins A J (1980) Introduction to Multivariate Analysis Chapman and Hall
Cooley W C and Lohnes P R (1971) Multivariate Data Analysis Wiley
Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20 (3) 2–25
Kendall M G and Stuart A (1979) The Advanced Theory of Statistics (3 Volumes) (4th Edition) Griffin
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill
g03 Chapter Introduction.
NAG Toolbox Overview.
NAG Web Site.
|
|