|
NAG[g03ecc] NAG[nag_mv_hierar_cluster_analysis] - Hierarchical cluster analysis
|
|
Calling Sequence
g03ecc(method, d, ilc, iuc, cd, iord, dord, 'n'=n, 'fail'=fail)
nag_mv_hierar_cluster_analysis(. . .)
Parameters
|
method - String;
|
|
|
On entry: indicates which clustering.
|
|
Single link.
|
|
Complete link.
|
|
Group average.
|
|
Centroid.
|
|
Median.
|
|
Minimum variance.
|
|
Constraint: "Nag_SingleLink", "Nag_CompleteLink", "Nag_GroupAverage", "Nag_Centroid", "Nag_Median" or "Nag_MinVariance". .
|
|
|
d - Vector(1.., datatype=float[8]);
|
|
|
Note: the dimension, dim, of the array d must be at least .
|
|
Constraint: , for . .
|
|
|
ilc - Vector(1.., datatype=integer[kernelopts('wordsize')/8]);
|
|
|
|
iuc - Vector(1.., datatype=integer[kernelopts('wordsize')/8]);
|
|
|
|
cd - Vector(1.., datatype=float[8]);
|
|
|
|
iord - Vector(1..n, datatype=integer[kernelopts('wordsize')/8]);
|
|
|
On exit: the objects in dendrogram order.
|
|
|
dord - Vector(1..n, datatype=float[8]);
|
|
|
|
'n'=n - integer; (optional)
|
|
|
Default value: the first dimension of the arrays iord, dord.
|
|
On entry: the number of objects, .
|
|
Constraint: . .
|
|
|
'fail'=fail - table; (optional)
|
|
|
The NAG error argument, see the documentation for NagError.
|
|
|
|
Description
|
|
|
Purpose
|
|
nag_mv_hierar_cluster_analysis (g03ecc) performs hierarchical cluster analysis.
|
|
Description
|
|
Given a distance or dissimilarity matrix for objects (see g03eac (nag_mv_distance_mat)), cluster analysis aims to group the objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods, a hierarchical tree is produced by starting with clusters, each with a single object and then at each of stages, merging two clusters to form a larger cluster, until all objects are in a single cluster. This process may be represented by a dendrogram (see g03ehc (nag_mv_dendrogram)).
At each stage, the clusters that are nearest are merged, methods differ as to how the distance between the new cluster and other clusters are computed. For three clusters , and let , and be the number of objects in each cluster and let , and be the distances between the clusters. Let clusters and be merged to give cluster , then the distance from cluster to cluster , can be computed in the following ways:
|
Single link or nearest neighbour: .
|
|
Complete link or furthest neighbour: .
|
|
Group average: .
|
|
Centroid: .
|
|
Median: .
|
|
Minimum variance: .
|
For further details see Everitt (1974) or Krzanowski (1990).
If the clusters are numbered then, for convenience, if clusters and , , merge then the new cluster will be referred to as cluster . Information on the clustering history is given by the values of , and for each of the clustering steps. In order to produce a dendrogram, the ordering of the objects such that the clusters that merge are adjacent is required. This ordering is computed so that the first element is 1. The associated distances with this ordering are also computed.
|
|
Error Indicators and Warnings
|
|
"NE_ALLOC_FAIL"
Dynamic memory allocation failed.
"NE_BAD_PARAM"
On entry, argument method had an illegal value.
"NE_DENDROGRAM"
A true dendrogram cannot be formed because the distances at which clusters have merged are not increasing for all steps, i.e., for some . This can occur for the and methods.
"NE_INT_ARG_LT"
On entry, n must not be less than 2: .
"NE_INTERNAL_ERROR"
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please consult NAG for assistance.
"NE_REALARR"
On entry, . Constraint: , .
|
|
|
Examples
|
|
>
|
method := "Nag_Median":
n := 5:
d := Vector([17, 2, 13, 16, 1, 10, 4, 17, 10, 20], datatype=float[8]):
ilc := Vector(4, datatype=integer[kernelopts('wordsize')/8]):
iuc := Vector(4, datatype=integer[kernelopts('wordsize')/8]):
cd := Vector(4, datatype=float[8]):
iord := Vector(5, datatype=integer[kernelopts('wordsize')/8]):
dord := Vector(5, datatype=float[8]):
NAG:-g03ecc(method, d, ilc, iuc, cd, iord, dord, 'n' = n):
|
|
|