/* File "hatco2group.sas": two group illustrative example from book p281-296. Note comments in SAS are enclosed by "slash-star" and "star-slash." First import the hatco.xls data as usual (use "hatco" for the member name). Then highlight the following code (from "proc stepdisc" to the first "run;"), and select Run > Submit from the main menu. */ proc stepdisc method=forward slentry=0.05 data=work.hatco; class x11; var x1 x2 x3 x4 x5 x6 x7; freq analysis; run; /* The above analysis produces most of the output in Tables 5.5-5.8. To produce the output in Tables 5.9-5.11 run the following: */ proc discrim canonical distance data=work.hatco testdata=work.hatco out=work.results list; class x11; var x1 x3 x7; freq analysis; testfreq holdout; priors proportional; run; /* In particular, you should be able to locate the following output: - eigenvalue = 2.019 - canonical correlation = 0.818 - Wilk's lambda (called "likelihood ratio") = 0.331 - standardized canonical coefficients (labeled "pooled within-class") - unstandardized canonical coefficients (labeled "raw") - function loadings (labeled "pooled within canonical structure") - linear discriminant (classification) function coefficients - group (class) means on canonical variables (centroids) - classification matrices for analysis (calibration) and holdout (test) samples (results differ slightly for analysis sample - see below) - classification results for individuals in analysis sample Note the following differences between SAS and the book: - an F-test is used for Wilk's lambda rather than a chi-squared test - SAS does not compute the discriminant function loadings in Table 5.9 for variables not used in the analysis; it is not too difficult to compute loadings for variables not used in the analysis by exporting the results file to Excel and calculating loading = pooled correlation between a variable and a canonical discriminant function - see "loadings2.xls" for exact details - SAS computes the classification results in Tables 5.10-11 using the posterior probability method rather than the cutting score method. Next, to profile correctly classified and misclassified observations run the following code to produce results for x11=0 in Table 5.12. */ data work.results; set work.results; x11_0=1-x11; proc ttest ci=none data=work.results; class _into_; var x1 x2 x3 x4 x5 x6 x7; freq x11_0; run; /* Run the following code to produce results for x11=1 similar to those in Table 5.12 (they don't match exactly since SAS classified some of these cases differently as mentioned above). */ proc ttest ci=none data=work.results; class _into_; var x1 x2 x3 x4 x5 x6 x7; freq x11; run;