SAS Analyst instructions
These instructions accompany Applied Regression Modeling by Iain Pardoe, 2nd edition
published by Wiley in 2012. The numbered items crossreference with the "computer help" references
in the book. These instructions are based on the Analyst Application for SAS 9 for Windows, but they (or something similar)
should also work for other versions. Find instructions for other statistical software packages
here.
Getting started and summarizing univariate data
 If desired, change SAS Analyst's default options by selecting
Tools > Options > Preferences.
 To open a SAS data file, select File > Open.
 SAS Analyst doesn't appear to offer a way to recall a previously used dialog box.
 Output appears in a separate window each
time you run an analysis. Select
Edit > Copy to Program Editor to copy the output to a
Program Editor Window. From there, output can be copied
and pasted from SAS to a word processor like OpenOffice Writer or Microsoft Word. Graphs
appear in separate windows and can also easily be copied and pasted
to other applications. If you misplace any output, you can easily
retrieve it by clicking on the Analyst Window and using
the lefthand Outline Pane.
 You can access help by selecting Help > Using This Window or
clicking the Analyst Help tool.
For example, to find out about "boxplots" find Box
Plots (under Creating Graphs in the main pane of the
Help Window).
 To transform data or compute a new variable,
first select Edit > Mode > Edit to change the
dataset from "browse" mode to "edit" mode. Then select
Data > Transform > Compute.
Type a name (with no spaces) for the new variable in the topleft
box, and type a mathematical expression for the variable in the
large box just below this. Current variables in the dataset can be
moved into this box, while the keypad and list of functions can be
used to create the expression. Examples are log(X) for
the natural logarithm of X and X**2 for
X^{2}. Click OK to create the new variable, which will be
added to the dataset (check it looks correct in the spreadsheet); it
can now be used just like any other variable. If you get the error
message "Unable to add a new column as specified," this means
there is a syntax error in your expression  a common mistake is to
forget the multiplication symbol (*) between a
number and a variable (e.g., 2*X represents 2X).
 To create indicator (dummy) variables from a qualitative variable, first
select Edit > Mode > Edit to change the dataset from
"browse" mode to "edit" mode. Then select Data >
Transform > Recode Values.
Select the qualitative variable in the Column to recode
box, type a name for the first indicator variable in the
New column name box, make sure New column type is
Numeric, and press OK. In the subsequent
Recode Values dialog box, type 1 into the
box next to the appropriate level, and type 0 into the
boxes next to each of the other levels. Click OK and check
that the correct indicator variable has been added to your
spreadsheet. Repeat for other indicator variables (if necessary).

 To find a percentile (critical value) for a tdistribution, [?].
 To find a percentile (critical value) for an Fdistribution, [?].
 To find a percentile (critical value) for a chisquared distribution, [?].

 To find an uppertail area (onetail pvalue) for a tdistribution, [?].
 To find an uppertail area (pvalue) for an Fdistribution, [?].
 To find an uppertail area (pvalue) for a chisquared distribution, [?].
 Calculate descriptive statistics for quantitative variables by selecting
Statistics > Descriptive > Summary Statistics.
Move the variable(s) into the Analysis list. Click
Statistics to select the summaries, such as the
Mean, that you would like.
 Create contingency tables or crosstabulations for qualitative
variables by selecting Statistics > Table Analysis.
Move one qualitative variable into the Row list and
another into the Column list. Cell percentages
(within rows, columns, or the whole table) can be calculated by
clicking Tables.
 If you have a quantitative variable and a qualitative variable, you can calculate
descriptive statistics for cases grouped in different categories by selecting Statistics > Descriptive > Summary Statistics.
Move the quantitative variable(s) into the Analysis list
and the qualitative variable(s) into the Class list.
Click Statistics to select the summaries that you would
like.
 SAS Analyst does not appear to offer an automatic way to make a
stemandleaf plot for a quantitative variable.
 To make a histogram for a quantitative variable, select Graphs > Histogram.
Move the variable into the Analysis box.
 To make a scatterplot with two quantitative variables, select
Graphs > Scatter Plot > TwoDimensional.
Move the vertical axis variable into the Y Axis box and
the horizontal axis variable into the X Axis box.
 SAS Analyst does not appear to offer an automatic way to make a
scatterplot matrix.
 You can mark or label cases in a scatterplot with different colors/symbols
according to categories in a qualitative variable by moving the variable into the Class box
in the Scatterplot dialog.
SAS Analyst does not appear to offer an automatic way to change the
colors/symbols used in a scatterplot.
 SAS Analyst does not appear to offer an automatic way to identify individual
cases in a scatterplot.
 To remove one of more observations from a dataset, [?].
 To make a bar chart for cases in different categories, select
Graphs > Bar Chart > Vertical.
 For frequency bar charts of one qualitative variable, move the
variable into the Chart box.
 For frequency bar charts of two qualitative variables, move one variable into the
Chart box and the other into the Group By box.
 The bars can also represent various summary functions for a quantitative variable.
For example, to represent Means, select
Options, click the Bar Values tab, move the
quantitative variable into the Analysis box, and select
Average for Statistic to chart.
 To make boxplots for cases in different categories, select
Graphs > Box Plot.
 For just one qualitative variable, move it into the Class box and move
the quantitative variable into the Analysis box.
 SAS Analyst does not appear to offer an automatic way to create
clustered boxplots for two qualitative variables.
 To make a QQplot (also known as a normal probability plot) for a
quantitative variable, select Graphs > Probability Plot.
Move the variable into the Analysis box and leave the
Distribution as Normal to assess normality
of the variable.
 To compute a confidence interval for a univariate population mean, Statistics > Hypothesis Tests > OneSample ttest for a Mean.
Move the variable for which you want to calculate the confidence
interval into the Variable box and click the
Tests button to bring up another dialog box in which you can select
Interval under Confidence intervals and
specify the confidence level for the interval. OK will
take you back to the previous dialog box, where you can now Click
OK.
 To do a hypothesis test for a univariate population mean, Statistics > Hypothesis Tests > OneSample ttest for a Mean.
Move the variable for which you want to do the test into the
Variable box and type the (null) hypothesized value into
the Mean = box. Specify a lowertailed ("less than"),
uppertailed ("greater than"), or twotailed ("not equal")
alternate hypothesis. OK will take you back to the
previous dialog box, where you can now Click OK.
Simple linear regression
 To fit a simple linear regression model (i.e., find a least squares line),
select Statistics > Regression > Simple. Move the response variable into the
Dependent box and the predictor variable into the Explanatory box. Just
Click OK for now—the other items in the dialog box are addressed below. In the rare
circumstance that you wish to fit a model without an intercept term (regression through the origin),
[?].
 To add a regression line or least squares line to a scatterplot,
select Statistics > Regression > Simple. Move the response variable into the
Dependent box and the predictor variable into the Explanatory box. Before
Clicking OK, click the Plots button, and check
Plot observed vs independent under Scatterplots. Click OK to return to
the main Simple Linear Regression dialog box, and then click OK. Click on the
Analyst Window, and doubleclick on Scatter plot under
Simple Linear Regression in the lefthand Outline Pane to find the resulting
graph.
 To find 95% confidence intervals for the regression parameters in a simple
linear regression model, select Statistics > Regression > Simple. Move the response
variable into the Dependent box and the predictor variable into the Explanatory
box. Before clicking OK, click the Statistics button, and check
Confidence limits for estimates under Parameter estimates. Click OK to
return to the main Simple Linear Regression dialog box, and then click OK. The
confidence intervals are displayed as the final two columns of the "Parameter Estimates" output.
This applies more generally to multiple linear regression also.
 To find a fitted value or predicted value of Y (the response
variable) at a particular value of X (the predictor variable) in a simple linear regression model, select Statistics > Regression > Simple. Move the response variable into the
Dependent box and the predictor variable into the Explanatory box. Before
clicking OK, click the Save Data button and add ? to the empty box in the
subsequent Simple Linear Regression: Save Data dialog box. Check the Create and save
diagnostics data box, click OK to return to the main Simple Linear Regression
dialog box, and then click OK. Click on the Analyst Window, and doubleclick on
Diagnostics Table under Simple Linear Regression > Diagnostics in the lefthand
Outline Pane to find the results. The fitted (or predicted) value of Y at each of the
Xvalues in the dataset are displayed in the column headed ?. You can also obtain a fitted (or predicted) value of Y at an Xvalue that is not in the dataset by doing the following. Before
fitting the regression model, create a dataset containing (just) the Xvalue in question (with the
same variable name as in the original dataset), and save this dataset. Then fit the regression
model and follow the steps above, but before clicking OK, click the Prediction
button, click Predict additional data under Prediction input and locate the
dataset you just saved under Data set name. Then check List predictions under
Prediction output. Click OK to return to the main
Simple Linear Regression dialog box, and then click OK. Click on the
Analyst Window, and doubleclick on Predictions under
Simple Linear Regression in the lefthand Outline Pane to find the results.
This applies more generally to multiple linear regression also.
 To find a 95% confidence interval for the mean of Y at a particular value
of X in a simple linear regression model, select Statistics > Regression > Simple. Move
the response variable into the Dependent box and the predictor variable into the
Explanatory box. Before clicking OK, click the Save Data button and
add L95M and U95M to the empty box in the subsequent
Simple Linear Regression: Save Data dialog box. Check the
Create and save diagnostics data box, click OK to return to the main
Simple Linear Regression dialog box, and then click OK. Click on the
Analyst Window, and doubleclick on Diagnostics Table under
Simple Linear Regression > Diagnostics in the lefthand Outline Pane to find the
results. The confidence intervals for the mean of Y at each of the Xvalues in the dataset are
displayed as two columns headed _L95M and _U95M. You can also obtain a confidence
interval for the mean of Y at an Xvalue that is not in the dataset by doing the following. Before
fitting the regression model, create a dataset containing (just) the Xvalue in question (with the
same variable name as in the original dataset), and save this dataset. Then fit the regression
model and follow the steps above, but before clicking OK, click the Prediction
button, click Predict additional data under Prediction input and locate the
dataset you just saved under Data set name. Then check List predictions and
Add prediction limits under Prediction output. Click OK to return to the
main Simple Linear Regression dialog box, and then click OK. Click on the
Analyst Window, and doubleclick on Predictions under
Simple Linear Regression in the lefthand Outline Pane to find the results.
This applies more generally to multiple linear regression also.
 To find a prediction interval for an individual value of Y at a particular
value of X in a simple linear regression model, select Statistics > Regression > Simple.
Move the response variable into the Dependent box and the predictor variable into the
Explanatory box. Before clicking OK, click the Save Data button and add
L95 and U95 to the empty box in the subsequent
Simple Linear Regression: Save Data dialog box. Check the
Create and save diagnostics data box, click OK to return to the main
Simple Linear Regression dialog box, and then click OK. Click on the
Analyst Window, and doubleclick on Diagnostics Table under
Simple Linear Regression > Diagnostics in the lefthand Outline Pane to find the
results. The prediction intervals for an individual value of Y at each of the Xvalues in the
dataset are displayed as two columns headed _L95 and _U95. This applies more
generally to multiple linear regression also. SAS Analyst does not appear to offer an
automatic way to create a prediction interval for an individual Yvalue at an Xvalue that is not in
the dataset.
Multiple linear regression
 To fit a multiple linear regression model, select
Statistics > Regression > Linear. Move the response variable into the Dependent
box and the predictor variables into the Explanatory box. In the rare circumstance that
you wish to fit a model without an intercept term (regression through the origin), [?].
 To add a quadratic regression line to a scatterplot, select
Statistics > Regression > Simple. Move the response variable into the Dependent
box and the predictor variable into the Explanatory box, and change Model from
Linear to Quadratic. Before clicking OK, click the Plots
button, and check Plot observed vs independent under Scatterplots. Click
OK to return to the main Simple Linear Regression dialog box, and then click
OK. Click on the Analyst Window, and doubleclick on Scatter plot under
Simple Linear Regression in the lefthand Outline Pane to find the resulting
graph.
 SAS Analyst does not appear to offer an automatic way to create a scatterplot with
separate regression lines for subsets of the sample.
 SAS Analyst does not appear to offer an automatic way to find the Fstatistic and
associated pvalue for a nested model Ftest in multiple linear regression. It is possible
to calculate these quantities by hand using SAS Analyst regression output and appropriate
percentiles from a Fdistribution.
 To save residuals in a multiple linear regression model, select
Statistics > Regression > Linear. Move the response variable into the Dependent
box and the predictor variables into the Explanatory box. Before clicking OK,
click the Save Data button and add ? to the empty box in the subsequent
Linear Regression: Save Data dialog box. Check the
Create and save diagnostics data box, click OK to return to the main
Linear Regression dialog box, and then click OK. Click on the
Analyst Window, and doubleclick on Diagnostics Table under
Linear Regression > Diagnostics in the lefthand Outline Pane to find the results.
The residuals are displayed as ?. To save what Pardoe (2012) calls
standardized residuals, add STUDENT to the empty box in the
Linear Regression: Save Data dialog box—they will be displayed as _STUDENT.
To save what Pardoe (2012) calls studentized residuals, add RSTUDENT to the empty
box in the Linear Regression: Save Data dialog box—they will be displayed as _RSTUDENT.
 SAS Analyst does not appear to offer an automatic way to add a
loess fitted line to a scatterplot.
 To save leverages in a multiple linear regression model, select
Statistics > Regression > Linear. Move the response variable into the Dependent
box and the predictor variables into the Explanatory box. Before clicking OK,
click the Save Data button and add H to the empty box in the subsequent
Linear Regression: Save Data dialog box. Check the
Create and save diagnostics data box, click OK to return to the main
Linear Regression dialog box, and then click OK. Click on the
Analyst Window, and doubleclick on Diagnostics Table under
Linear Regression > Diagnostics in the lefthand Outline Pane to find the results.
The leverages are displayed as _H.
 To save Cook's distances in a multiple linear regression model, select
Statistics > Regression > Linear. Move the response variable into the Dependent
box and the predictor variables into the Explanatory box. Before clicking OK,
click the Save Data button and add COOKD to the empty box in the subsequent
Linear Regression: Save Data dialog box. Check the
Create and save diagnostics data box, click OK to return to the main
Linear Regression dialog box, and then click OK. Click on the
Analyst Window, and doubleclick on Diagnostics Table under
Linear Regression > Diagnostics in the lefthand Outline Pane to find the results.
Cook's distances are displayed as _COOKD.
 To create some residual plots automatically in a multiple linear regression
model, select Statistics > Regression > Linear. Move the response variable into the
Dependent box and the predictor variables into the Explanatory box. Before
clicking OK, click the Plots button, and click the Residual tab in the
subsequent Linear Regression: Plots dialog box. Check Plot residuals vs variables
under Residual plots, and select Standardized for Residuals and
Predicted Y for Variables to create a scatterplot of the standardized residuals on
the vertical axis versus the standardized predicted values on the horizontal axis. You could also
check Independents for Variables to create residual plots with each predictor
variable on the horizontal axis. Click OK to return to the main Linear Regression
dialog box, and then click OK. Click on the Analyst Window, and doubleclick on
the resulting graphs under Linear Regression > Residual Plots in the lefthand
Outline Pane. To create residual plots manually, first create studentized residuals (see
help #35), and then construct scatterplots with these studentized residuals on the vertical
axis.
 To create a correlation matrix of quantitative variables (useful for
checking potential multicollinearity problems), select
Statistics > Descriptive > Correlations. Move the variables into the Correlate
box and Click OK.
 To find variance inflation factors in multiple linear regression, select
Statistics > Regression > Linear. Move the response variable into the Dependent
box and the predictor variables into the Explanatory box. Before clicking OK,
click the Statistics button, and the Tests tab in the resulting
Linear Regression: Statistics dialog box. Check Variance inflation factors under
Collinearity, click OK to return to the main Linear Regression dialog
box, and then click OK. The variance inflation factors are in the last column of the
"Parameter Estimates" output under "Variance Inflation."
 To draw a predictor effect plot for graphically displaying the effects of
transformed quantitative predictors and/or interactions between quantitative and qualitative
predictors in multiple linear regression, first create a variable representing the effect, say,
"X1effect" (see computer help #6).
 This variable must just involve X1 (e.g., 1 + 3X1 + 4X1^{2}). Then
select Graphs > Scatter Plot > TwoDimensional. Move the "X1effect" variable into the
Y Axis box and X1 into the X Axis box. Before clicking OK, click on
Display and select Connect points with straight lines in the resulting
2D Scatter Plot: Display dialog box. Click OK to return to the main
2D Scatter Plot dialog box, and then click OK.
 SAS Analyst does not appear to offer an automatic way to create more complex
predictor effect plots (say, with separate lines representing different subsets of the sample).
See Section 5.5 in Pardoe (2012) for an example.
Last updated: May, 2012
© 2012, Iain Pardoe