/* File "jobspw3.sas": Jobs example continued. Note comments in SAS are enclosed by "slash-star" and "star-slash." First import the jobspw3.xls data as usual, then highlight the following code (from "proc cluster" to the first "run;"), and select Run > Submit from the main menu. This should run Ward's Minimum Variance Cluster Analysis. */ proc cluster method=ward noeigen nonorm data=work.jobspw3 outtree=work.htree; var Eugene Portland Seattle Denver S50K S60K S70K S80K; id id; run; /* If you click on the Explorer tab in the Results window you should see file "Htree" in the Work library. This file can be used to construct a dendogram using proc tree (Highlight from "proc tree" to "run;" and "Submit"): */ proc tree data=work.htree; run; /* Details of cluster membership for the 2-cluster solution can be saved in a data table using the following code: */ proc tree data=work.htree nclusters=2 noprint out=work.h2cl; copy Eugene Portland Seattle Denver S50K S60K S70K S80K; run; /* Use the following code to create a data table with cluster means */ proc means data=work.h2cl; output out=work.h2clmeans mean(Eugene Portland Seattle Denver S50K S60K S70K S80K)= Eugene Portland Seattle Denver S50K S60K S70K S80K; class CLUSTER; run; /* For nonhierarchical analysis using hierarchical cluster means use the following code (which uses the means as seeds): */ proc fastclus maxclusters=2 data=work.jobspw3 seed=work.h2clmeans out=work.nh2cl; var Eugene Portland Seattle Denver S50K S60K S70K S80K; id id; run; /* Select Solutions > Analysis > Analyst to profile the 2-cluster solution and assess significant differences. First use File > Open By SAS Name to open "Nh2cl" (which should be in the Work library). Then use Statistics > ANOVA > One-Way ANOVA, and put all the variables (Denver, Eugene, etc.) as the dependent variables, and "CLUSTER" as the independent variable: this will provide Analysis of Variance F-tests (amongst other things). Do the same but with Age, Exp, and Sal as the dependent variables. Then use Statistics > Table Analysis, with Status as the Row variable and Cluster as the Column variable; click on the Statistics button and check "Chi-square statistics" and click OK: this will provide a chi-squared contingency table goodness of fit test. Next, use "proc means" to create data table "Nh2clmeans" (which can be exported to Excel and used to create profile plots): */ proc means data=work.nh2cl; output out=work.nh2clmeans mean(Eugene Portland Seattle Denver S50K S60K S70K S80K)= Eugene Portland Seattle Denver S50K S60K S70K S80K; class CLUSTER; run; /* In particular, click on the Explorer tab in the Results window and "right-click" on the file "Nh2clmeans" in the Work library. Then select Export to export the data to an Excel spreadsheet. If you open this spreadsheet in Excel you should be able to figure out how to create the profile line plots (see p483 for example).