DSC 410/510 - Multivariate Statistical Methods

Chapter 9

Suggested Solutions

  1. HOW DOES THE RESEARCHER KNOW WHETHER TO USE HIERARCHICAL OR NONHIERARCHIICAL CLUSTER TECHNIQUES? UNDER WHICH CONDITIONS WOULD EACH APPROACH BE USED?

    The choice of a hierarchical or nonhierarchical technique often depends on the research problem at hand. In the past, hierarchical clustering techniques were more popular with Ward's method and average linkage being probably the best available. Hierarchical procedures do have the advantage of being fast and taking less computer time, but they can be misleading because undesirable early combinations may persist throughout the analysis and lead to artificial results. To reduce this possibility, the analyst may wish to cluster analyze the data several times after deleting problem observations or outlines.
    However, the K-means (nonhierarchical) procedure appears to be more robust than any of the hierarchical methods with respect to the presence of outliers, error disturbances of the distance measure, and the choice of a distance measure. The choice of the clustering algorithm and solution characteristics appears to be critical to the successful use of cluster analysis.
    If a practical, objective, and theoretically sound approach can be developed to select the seeds, then a nonhierarchical method can be used. If the analyst is concerned with the cost of the analysis and has no a priori knowledge as to initial starting values or number of clusters, then a hierarchical method should be employed.
    Punj and Stewart (1983) suggest a two-stage procedure to deal with the problem of selecting initial starting values and clusters. The first step entails using one of the hierarchical methods to obtain a first approximation of a solution. Then select a candidate number of clusters based on the initial cluster solution, obtain centroids, and eliminate outliers. Finally, use an iterative partitioning algorithm using cluster centroids of preliminary analysis as starting points (excluding outliers) to obtain a final solution.

    Punj, Girish and David Stewart, "Cluster Analysis in Marketing Research: Review and Suggestions for Application," Journal of Marketing Research, 20 (May 1983), pp. 134-148.

  2. HOW CAN YOU DECIDE HOW MANY CLUSTERS TO HAVE IN YOUR SOLUTION?

    Although no standard objective selection procedure exists for determining the number of clusters, the analyst may use the distances between clusters at successive steps as a guideline. In using this method, the analyst may choose to stop when this distance exceeds a specified value or when the successive distances between steps make a sudden jump. Also, some intuitive conceptual or theoretical relationship may suggest a natural number of clusters. In the final analysis, however, it is probably best to compute solutions for several different numbers of clusters and then to decide among the alternative solutions based upon a priori criteria, practical judgment, common sense, or theoretical foundation.

  3. HOW DO RESEARCHERS USE THE GRAPHICAL PORTRAYALS OF THE CLUSTER PROCEDURE?

    The hierarchical clustering process may be represented graphically in several ways; nested groupings, a vertical icicle diagram, or a dendogram. The researcher would use these graphical portrayals to better understand the nature of the clustering process. Specifically, the graphics might provide additional information about the number of clusters that should be formed as well as information about outlier values that resist joining a group.

© 2003, Iain Pardoe, Lundquist College of Business, University of Oregon
Last updated September 26, 2003