DSC 410/510 - Multivariate Statistical Methods
Chapter 9
Suggested Solutions
HOW DOES THE RESEARCHER KNOW WHETHER TO USE HIERARCHICAL
OR NONHIERARCHIICAL CLUSTER TECHNIQUES? UNDER WHICH CONDITIONS
WOULD EACH APPROACH BE USED?
The choice of a hierarchical or nonhierarchical technique often
depends on the research problem at hand. In the past, hierarchical
clustering techniques were more popular with Ward's method and average
linkage being probably the best available. Hierarchical procedures do
have the advantage of being fast and taking less computer time, but
they can be misleading because undesirable early combinations may
persist throughout the analysis and lead to artificial results. To
reduce this possibility, the analyst may wish to cluster analyze the
data several times after deleting problem observations or outlines.
However, the K-means (nonhierarchical) procedure appears to be more
robust than any of the hierarchical methods with respect to the
presence of outliers, error disturbances of the distance measure, and
the choice of a distance measure. The choice of the clustering
algorithm and solution characteristics appears to be critical to the
successful use of cluster analysis.
If a practical, objective, and theoretically sound approach can be
developed to select the seeds, then a nonhierarchical
method can be used. If the analyst is concerned with the cost of the
analysis and has no a priori knowledge as to initial starting values
or number of clusters, then a hierarchical method should be
employed.
Punj and Stewart (1983) suggest a two-stage procedure to deal with the
problem of selecting initial starting values and clusters. The first
step entails using one of the hierarchical methods to obtain a first
approximation of a solution. Then select a candidate number of clusters
based on the initial cluster solution, obtain centroids, and eliminate
outliers. Finally, use an iterative partitioning algorithm using
cluster centroids of preliminary analysis as starting points
(excluding outliers) to obtain a final solution.
Punj, Girish and David Stewart, "Cluster Analysis in Marketing
Research: Review and Suggestions for Application," Journal of
Marketing Research, 20 (May 1983), pp. 134-148.
HOW CAN YOU DECIDE HOW MANY CLUSTERS TO HAVE IN YOUR
SOLUTION?
Although no standard objective selection procedure exists for
determining the number of clusters, the analyst may use the distances
between clusters at successive steps as a guideline. In using this
method, the analyst may choose to stop when this distance exceeds a
specified value or when the successive distances between steps make a
sudden jump. Also, some intuitive conceptual or theoretical
relationship may suggest a natural number of clusters. In the final
analysis, however, it is probably best to compute solutions for
several different numbers of clusters and then to decide among the
alternative solutions based upon a priori criteria, practical
judgment, common sense, or theoretical foundation.
HOW DO RESEARCHERS USE THE GRAPHICAL PORTRAYALS OF THE
CLUSTER PROCEDURE?
The hierarchical clustering process may be represented graphically in
several ways; nested groupings, a vertical icicle diagram, or a
dendogram. The researcher would use these graphical portrayals to
better understand the nature of the clustering process. Specifically,
the graphics might provide additional information about the number of
clusters that should be formed as well as information about outlier
values that resist joining a group.
© 2003, Iain Pardoe, Lundquist College of Business,
University of Oregon
Last updated September 26, 2003