/* File "hatcocluster.sas": Illustrative example from textbook p502-515. Note comments in SAS are enclosed by "slash-star" and "star-slash." First import the hatco.xls data as usual, then highlight the following code (from "proc cluster" to the first "run;"), and select Run > Submit from the main menu. This should run Ward's Minimum Variance Cluster Analysis, and you should see output similar to Table 9.5 on p504-505. Note that the book reports "within-cluster sum of squares" whereas SAS reports "between-cluster sum of squares" (BSS). */ proc cluster method=ward noeigen nonorm data=work.hatco outtree=work.htree; var X1 X2 X3 X4 X5 X6 X7; copy X8 X11 X12 X13 X14; id id; run; /* If you click on the Explorer tab in the Results window you should see file "Htree" in the Work library. This file can be used to construct a dendogram using proc tree (Highlight from "proc tree" to "run;" and "Submit"): */ proc tree data=work.htree; run; /* The dendogram is equivalent to Figure 9.11 on p507. Details of cluster membership for the 2 cluster solution can be saved in a data table using the following code: */ proc tree data=work.htree nclusters=2 noprint out=work.h2cl; copy X1 X2 X3 X4 X5 X6 X7 X8 X11 X12 X13 X14; run; /* To obtain information in Table 9.7 on p508 select Solutions > Analysis > Analyst with SAS file "H2cl" to profile the 2-cluster solution and assess significant differences. First use File > Open By SAS Name to open "H2cl" (which should be in the Work library). Then use Statistics > ANOVA > One-Way ANOVA, and put X1, X2, ..., X7 as the dependent variables, and "CLUSTER" as the independent variable: this will provide Analysis of Variance F-tests (amongst other things). Use Statistics > Table Analysis to perform chi-square tests for qualitative variables (like X8, X11, X12, X13, and X14). Details of cluster membership for the 4 cluster solution can be saved in a data table using the following code: */ proc tree data=work.htree nclusters=4 noprint out=work.h4cl; copy X1 X2 X3 X4 X5 X6 X7 X8 X11 X12 X13 X14; run; /* To obtain information in Table 9.7 on p508 select Solutions > Analysis > Analyst with SAS file "H4cl" to profile the 4-cluster solution and assess significant differences. Use the following code to create a data table with cluster means for the 2-cluster solution */ proc means data=work.h2cl; output out=work.h2clmeans mean(X1 X2 X3 X4 X5 X6 X7)=X1 X2 X3 X4 X5 X6 X7; class CLUSTER; run; /* Similarly use the following code for the 4-cluster solution */ proc means data=work.h4cl; output out=work.h4clmeans mean(X1 X2 X3 X4 X5 X6 X7)=X1 X2 X3 X4 X5 X6 X7; class CLUSTER; run; /* For nonhierarchical analysis using hierarchical cluster means use the following code (which uses the means as seeds): (Again select Solutions > Analysis > Analyst to profile the solution and assess significant differences.) */ proc fastclus maxclusters=2 data=work.hatco seed=work.h2clmeans out=work.nh2cl; var X1 X2 X3 X4 X5 X6 X7; id id; run; /* Similarly for a 4-cluster solution: */ proc fastclus maxclusters=4 data=work.hatco seed=work.h4clmeans out=work.nh4cl; var X1 X2 X3 X4 X5 X6 X7; id id; run; /* Use "proc means" to create data table "nh2clmeans" (which can be exported to Excel and used to create plots in Fig. 9.12 on p511): */ proc means data=work.nh2cl; output out=work.nh2clmeans mean(X1 X2 X3 X4 X5 X6 X7)=X1 X2 X3 X4 X5 X6 X7; class CLUSTER; run; /* Similarly for 4-cluster solution: */ proc means data=work.nh4cl; output out=work.nh4clmeans mean(X1 X2 X3 X4 X5 X6 X7)=X1 X2 X3 X4 X5 X6 X7; class CLUSTER; run; /* If you don't specify seeds in "proc fastclus" then initial seeds are based on the data (compare with results in Table 9.9 on p513): */ proc fastclus maxclusters=2 data=work.hatco; var X1 X2 X3 X4 X5 X6 X7; id id; run; /* Similarly for 4-cluster solution: */ proc fastclus maxclusters=4 data=work.hatco; var X1 X2 X3 X4 X5 X6 X7; id id; run; /* Finally select Solutions > Analysis > Analyst with "Nh2cl" or "Nh4cl" to obtain results in Table 9.10 In particular use Statistics > Table Analysis and select X8, X11, X12, X13, and X14 as row variables, CLUSTER as the column variable, click the Statistics button, and check "Chi-square statistics." */