|
NAG[g03dcc] NAG[nag_mv_discrim_group] - Allocates observations to groups, following g03dac (nag_mv_discrim)
|
|
Calling Sequence
g03dcc(approach, equal, priors, nig, gmean, gcov, det, isx, x, prior, p, iag, atiq, ati, 'nvar'=nvar, 'ng'=ng, 'tdg'=tdg, 'nobs'=nobs, 'm'=m, 'tdx'=tdx, 'tdp'=tdp, 'fail'=fail)
nag_mv_discrim_group(. . .)
Parameters
|
approach - String;
|
|
|
On entry: indicates whether the estimative or predictive approach is to be used.
|
|
The estimative approach is used.
|
|
The predictive approach is used.
|
|
Constraint: "Nag_DiscrimEstimate" or "Nag_DiscrimPredict". .
|
|
|
equal - String;
|
|
|
On entry: indicates whether or not the within-group variance-covariance matrices are assumed to be equal and the pooled variance-covariance matrix used.
|
|
Constraint: "Nag_EqualCovar" or "Nag_NotEqualCovar". .
|
|
|
priors - String;
|
|
|
On entry: indicates the form of the prior probabilities to be used.
|
|
Equal prior probabilities are used.
|
|
Prior probabilities proportional to the group sizes in the training set, , are used.
|
|
The prior probabilities are input in prior.
|
|
Constraint: "Nag_EqualPrior", "Nag_GroupSizePrior" or "Nag_UserPrior". .
|
|
|
nig - Vector(1..ng, datatype=integer[kernelopts('wordsize')/8]);
|
|
|
On entry: the number of observations in each group training set, .
|
|
|
gmean - Matrix(1..ng, 1..tdg, datatype=float[8], order=C_order);
|
|
|
|
gcov - Vector(1.., datatype=float[8]);
|
|
|
Note: the dimension, dim, of the array gcov must be at least .
|
|
|
det - Vector(1..ng, datatype=float[8]);
|
|
|
On entry: if , the logarithms of the determinants of the within-group variance-covariance matrices as returned by g03dac (nag_mv_discrim). Otherwise det is not referenced.
|
|
|
isx - Vector(1..m, datatype=integer[kernelopts('wordsize')/8]);
|
|
|
Constraint: for nvar values of .
|
|
|
x - Matrix(1..nobs, 1..tdx, datatype=float[8], order=C_order);
|
|
|
|
prior - Vector(1..ng, datatype=float[8]);
|
|
|
On entry: if the prior probabilities for the groups.
|
|
On exit: if , the computed prior probabilities in proportion to group sizes for the groups.
|
|
If , the input prior probabilities will be unchanged.
|
|
If , prior is not set.
|
|
|
p - Matrix(1..nobs, 1..tdp, datatype=float[8], order=C_order);
|
|
|
|
iag - Vector(1..nobs, datatype=integer[kernelopts('wordsize')/8]);
|
|
|
On exit: the groups to which the observations have been allocated.
|
|
|
atiq - boolean;
|
|
|
On entry: atiq must be true if atypicality indices are required. If atiq is false, the array ati is not set.
|
|
|
ati - Matrix(1..nobs, 1..tdp, datatype=float[8], order=C_order);
|
|
|
|
'nvar'=nvar - integer; (optional)
|
|
|
Default value: the second dimension of the array gmean.
|
|
Constraint: . .
|
|
|
'ng'=ng - integer; (optional)
|
|
|
Default value: the first dimension of the arrays nig, gmean, det, prior and the second dimension of the arrays nig, gmean, det, priorthe arrays p, ati.
|
|
On entry: the number of groups, .
|
|
Constraint: . .
|
|
|
'tdg'=tdg - integer; (optional)
|
|
|
On entry: the second dimension of the array gmean as declared in the function from which nag_mv_discrim_group (g03dcc) is called.
|
|
Constraint: . .
|
|
|
'nobs'=nobs - integer; (optional)
|
|
|
Default value: the first dimension of the arrays x, p, iag, ati.
|
|
On entry: the number of observations in x which are to be allocated.
|
|
Constraint: . .
|
|
|
'm'=m - integer; (optional)
|
|
|
Default value: the first dimension of the array isx and the second dimension of the array isxthe array x.
|
|
On entry: the number of variables in the data array x.
|
|
Constraint: . .
|
|
|
'tdx'=tdx - integer; (optional)
|
|
|
On entry: the second dimension of the array x as declared in the function from which nag_mv_discrim_group (g03dcc) is called.
|
|
Constraint: . .
|
|
|
'tdp'=tdp - integer; (optional)
|
|
|
On entry: the second dimension of the array p as declared in the function from which nag_mv_discrim_group (g03dcc) is called.
|
|
Constraint: . .
|
|
|
'fail'=fail - table; (optional)
|
|
|
The NAG error argument, see the documentation for NagError.
|
|
|
|
Description
|
|
|
Purpose
|
|
nag_mv_discrim_group (g03dcc) allocates observations to groups according to selected rules. It is intended for use after g03dac (nag_mv_discrim).
|
|
Description
|
|
Discriminant analysis is concerned with the allocation of observations to groups using information from other observations whose group membership is known, ; these are called the training set. Consider variables observed on populations or groups. Let be the sample mean and the within-group variance-covariance matrix for the th group; these are calculated from a training set of observations with observations in the th group, and let be the th observation from the set of observations to be allocated to the groups. The observation can be allocated to a group according to a selected rule. The allocation rule or discriminant function will be based on the distance of the observation from an estimate of the location of the groups, usually the group means. A measure of the distance of the observation from the th group mean is given by the Mahalanobis distance, :
(1)
If the pooled estimate of the variance-covariance matrix is used rather than the within-group variance-covariance matrices, then the distance is:
(2)
Instead of using the variance-covariance matrices and , nag_mv_discrim_group (g03dcc) uses the upper triangular matrices and supplied by g03dac (nag_mv_discrim) such that and . can then be calculated as where or as appropriate.
In addition to the distances, a set of prior probabilities of group membership, , for , may be used, with . The prior probabilities reflect the user's view as to the likelihood of the observations coming from the different groups. Two common cases for prior probabilities are , that is, equal prior probabilities, and , for , that is, prior probabilities proportional to the number of observations in the groups in the training set.
nag_mv_discrim_group (g03dcc) uses one of four allocation rules. In all four rules the variables are assumed to follow a multivariate Normal distribution with mean and variance-covariance matrix if the observation comes from the th group. The different rules depend on whether or not the within-group variance-covariance matrices are assumed equal, i.e., , and whether a predictive or estimative approach is used. If is the probability of observing the observation from group , then the posterior probability of belonging to group is:
(3)
In the estimative approach, the arguments and in (3) are replaced by their estimates calculated from . In the predictive approach, a non-informative prior distribution is used for the arguments and a posterior distribution for the arguments, , is found. A predictive distribution is then obtained by integrating over the argument space. This predictive distribution then replaces in (3). See Aitchison and Dunsmore (1975), Aitchison et al. (1977) and Moran and Murphy (1979) for further details.
The observation is allocated to the group with the highest posterior probability. Denoting the posterior probabilities, , by , the four allocation rules are:
|
Estimative with equal variance-covariance matrices – Linear Discrimination.
|
|
Estimative with unequal variance-covariance matrices – Quadratic Discrimination.
|
|
Predictive with equal variance-covariance matrices.
|
|
Predictive with unequal variance-covariance matrices
|
|
In the above the appropriate value of from (1) or (2) is used. The values of the are standardized so that,
|
|
Moran and Murphy (1979) show the similarity between the predictive methods and methods based upon likelihood ratio tests.
|
|
where is the lower tail probability from a beta distribution where, for unequal within-group variance-covariance matrices,
|
|
and for equal within-group variance-covariance matrices,
|
|
If is close to 1 for all groups it indicates that the observation may come from a grouping not represented in the training set. Moran and Murphy (1979) provide a frequentist interpretation of .
|
|
|
Error Indicators and Warnings
|
|
"NE_2_INT_ARG_LT"
On entry, while . These arguments must satisfy .
"NE_ALLOC_FAIL"
Dynamic memory allocation failed.
"NE_BAD_PARAM"
On entry, argument approach had an illegal value.
"NE_DIAG_0_COND"
A diagonal element of R is zero when .
"NE_DIAG_0_J_COND"
A diagonal element of R is zero for some , when
"NE_GROUP_SUM"
On entry, the , , . Constraint: when .
"NE_INT_ARG_LT"
On entry, nvar must not be less than 1: .
"NE_INTARR"
On entry, . Constraint: , when .
"NE_INTARR_INT"
On entry, , . Constraint: , when .
"NE_INTERNAL_ERROR"
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please consult NAG for assistance.
"NE_PRIOR_SUM"
On entry, . Constraint: must be within machine precision of 1 when .
"NE_REALARR"
On entry, . Constraint: , when .
"NE_VAR_INCL_INDICATED"
The number of variables, nvar in the analysis , while number of variables included in the analysis via array . Constraint: these two numbers must be the same.
|
|
Accuracy
|
|
The accuracy of the returned posterior probabilities will depend on the accuracy of the input or matrices. The atypicality index should be accurate to four significant places.
|
|
|
Examples
|
|
>
|
approach := "Nag_DiscrimPredict":
equal := "Nag_NotEqualCovar":
priors := "Nag_EqualPrior":
nvar := 2:
ng := 3:
tdg := 2:
nobs := 6:
m := 2:
tdx := 2:
tdp := 3:
atiq := true:
nig := Vector([6, 10, 5], datatype=integer[kernelopts('wordsize')/8]):
gmean := Matrix([[1.0433, -0.6034166666666667], [2.00727, -0.20604], [2.70974, 1.5998]], datatype=float[8], order='C_order'):
gcov := Vector([-0.5099642881287538, -0.2797054723861329, -1.217327847040481, -0.3326727521153483, -0.3723518779712079, -1.987589395382754, -0.4603014906920608, -0.7041634974247671, 0.4737334252803499, 0.7451327720614629, -0.3251057349548681, -0.4275545007358186], datatype=float[8]):
det := Vector([-0.8273469064608425, -3.045968198109008, -2.287732741158105], datatype=float[8]):
isx := Vector([1, 1], datatype=integer[kernelopts('wordsize')/8]):
x := Matrix([[1.6292, -0.9163], [2.5572, 1.6094], [2.5649, -0.2231], [0.9555, -2.3026], [3.4012, -2.3026], [3.0204, -0.2231]], datatype=float[8], order='C_order'):
prior := Vector([0, 0, 0], datatype=float[8]):
p := Matrix(6, 3, datatype=float[8], order='C_order'):
iag := Vector(6, datatype=integer[kernelopts('wordsize')/8]):
ati := Matrix(6, 3, datatype=float[8], order='C_order'):
NAG:-g03dcc(approach, equal, priors, nig, gmean, gcov, det, isx, x, prior, p, iag, atiq, ati, 'nvar' = nvar, 'ng' = ng, 'tdg' = tdg, 'nobs' = nobs, 'm' = m, 'tdx' = tdx, 'tdp' = tdp):
|
|
|
See Also
|
|
Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Aitchison J, Habbema J D F and Kay J W (1977) A critical comparison of two methods of statistical discrimination Appl. Statist. 26 15–25
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press
Moran M A and Murphy B J (1979) A closer look at two alternative methods of statistical discrimination Appl. Statist. 28 223–232
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill
g03 Chapter Introduction.
NAG Toolbox Overview.
NAG Web Site.
|
|