SPSS instructions
These instructions accompany Applied Regression Modeling by Iain Pardoe, 2nd edition
published by Wiley in 2012. The numbered items crossreference with the "computer help" references
in the book. These instructions are based on SPSS 20 for Windows, but they (or something similar)
should also work for other versions. Find instructions for other statistical software packages
here.
Getting started and summarizing univariate data
 If desired, change SPSS's default options by selecting
Edit > Options. For example, to display variable names (in alphabetical order) rather than
labels in dialog boxes, click the General tab; in the Variable Lists group select
Display names and select Alphabetical. To show variable names rather than labels
in output tables, click the Output Labels tab; under Pivot Table Labeling change
Variables in labels shown as to Names. To display small numbers in tables without
using scientific notation (which can make reading the numbers more difficult), click the
General tab; under Output check No scientific notation for small numbers in
tables.
 To open a SPSS data file, select File > Open > Data.
 To recall a previously used dialog box, click the Dialog Recall
tool (fourth button from the left in the Data Editor Window, sixth button from the left in
the Viewer Window).
 Output can be edited in the Viewer Window. Individual pieces of
output (including tables and graphs) can be selected, edited, moved, deleted, and so on using both
the Outline Pane (on the left) and the Display Pane (on the right). Text and
headings can be entered using the Insert menu. Alternatively, copy and paste pieces of
output from SPSS to a word processor like OpenOffice Writer or Microsoft Word.
 You can access help by selecting Help > Topics. For example, to
find out about "boxplots" click the Index tab, type boxplots in the first box, and
select the index entry you want in the second box.
 To transform data or compute a new variable, select
Transform > Compute Variable. Type a name (with no spaces) for the new variable in the
Target Variable box, and type a mathematical expression for the variable in the
Numeric Expression box. Current variables in the dataset can be moved into the
Numeric Expression box, while the keypad and list of functions can be used to create the
expression. Examples are LN(X) for the natural logarithm of X and X**2 for
X^{2}. Click OK to create the new variable, which will be added to the dataset (check it looks correct in the Data Editor Window); it can now be used just like any other
variable. If you get the error message "expression ends unexpectedly," this means there is a syntax
error in your Numeric Expression—a common mistake is to forget the multiplication
symbol (*) between a number and a variable (e.g., 2*X represents 2X).
 To create indicator (dummy) variables from a qualitative variable, select
Transform > Recode into Different Variables. Move the qualitative variable into the
Input Variable > Output Variable box, type a name for the first indicator variable in the
Output Variable Name box, and press Change (the name should replace the question
mark in the Input Variable > Output Variable box). Next, press
Old and New Values, type the appropriate category name/number into the Old Value
box, type 1 into the New Value box, and press Add. Then select
All other values, type 0 into the New Value box, and press Add.
Click Continue to return to the previous dialog box, and click OK (check that the
correct indicator variable has been added to your spreadsheet in the Data Editor Window).
Repeat for other indicator variables (if necessary).

 To find a percentile (critical value) for a tdistribution, select
Transform > Compute Variable. Type a name (with no spaces) in the Target Variable
box (e.g., "cvt"). Then type IDF.T(p, df) into the Numeric Expression box. Here
p is the lowertail area (i.e., one minus the onetail significance level) and df
is the degrees of freedom. Click OK to see the result in the Data Editor Window,
where it will appear in a new column. You may need to click Variable View (at the bottom
of the window) to change the number of decimal places displayed. For example,
IDF.T(0.95, 29) returns the 95th percentile of the tdistribution with 29 degrees of
freedom (1.699), which is the critical value for an uppertail test with a 5% significance level. By
contrast, IDF.T(0.975, 29) returns the 97.5th percentile of the tdistribution with 29
degrees of freedom (2.045), which is the critical value for a twotail test with a 5% significance
level.
 To find a percentile (critical value) for an Fdistribution, select
Transform > Compute Variable. Type a name (with no spaces) in the Target Variable
box (e.g., "cvf"). Then type IDF.F(p, df1, df2) into the Numeric Expression box.
Here p is the lowertail area (i.e., one minus the significance level), df1 is the
numerator degrees of freedom, and df2 is the denominator degrees of freedom. For example,
IDF.F(0.95, 2, 3) returns the 95th percentile of the Fdistribution with 2 numerator
degrees of freedom and 3 denominator degrees of freedom (9.552).
 To find a percentile (critical value) for a chisquared distribution,
select Transform > Compute Variable. Type a name (with no spaces) in the
Target Variable box (e.g., "cvchisq"). Then type IDF.CHISQ(p, df) into the
Numeric Expression box. Here p is the lowertail area (i.e., one minus the
significance level) and df is the degrees of freedom. For example,
IDF.CHISQ(0.95, 2) returns the 95th percentile of the chisquared distribution with 2
degrees of freedom (5.991).

 To find an uppertail area (onetail pvalue) for a tdistribution, select
Transform > Compute Variable. Type a name (with no spaces) in the Target Variable
box (e.g., "pt"). Then type 1  CDF.T(t, df) into the Numeric Expression box.
Here t is the value of the tstatistic and df is the degrees of freedom. For
example, 1  CDF.T(2.40, 29) returns the uppertail area for a tstatistic of 2.40 from the
tdistribution with 29 degrees of freedom (0.012), which is the pvalue for an uppertail test. By
contrast, 2*(1  CDF.T(2.40, 29)) returns the twotail area for a tstatistic of
2.40 from the tdistribution with 29 degrees of freedom (0.023), which is the pvalue for a twotail test.
 To find an uppertail area (pvalue) for an Fdistribution, select
Transform > Compute Variable. Type a name (with no spaces) in the Target Variable
box (e.g., "pf"). Then type SIG.F(f, df1, df2) into the Numeric Expression box.
Here f is the value of the Fstatistic, df1 is the numerator degrees of freedom,
and df2 is the denominator degrees of freedom. For example, SIG.F(51.4, 2, 3)
returns the uppertail area (pvalue) for an Fstatistic of 51.4 for the Fdistribution with 2
numerator degrees of freedom and 3 denominator degrees of freedom (0.005).
 To find an uppertail area (pvalue) for a chisquared distribution, select Transform > Compute Variable. Type a name (with no spaces) in the Target Variable
box (e.g., "pchisq"). Then type SIG.CHISQ(chisq, df) into the Numeric Expression
box. Here chisq is the value of the chisquared statistic and df is the degrees of
freedom. For example, SIG.CHISQ(0.38, 2) returns the uppertail area (pvalue) for a
chisquared statistic of 0.38 for the chisquared distribution with 2 degrees of freedom
(0.827).
 Calculate descriptive statistics for quantitative variables by selecting
Analyze > Descriptive Statistics > Frequencies. Move the variable(s) into the
Variable(s) list. Click Statistics to select the summaries, such as the
Mean, that you would like. To avoid superfluous output uncheck
Display frequency tables.
 Create contingency tables or crosstabulations for qualitative
variables by selecting Analyze > Descriptive Statistics > Crosstabs. Move one qualitative
variable into the Row(s) list and another into the Column(s) list. Cell
percentages (within rows, columns, or the whole table) can be calculated by clicking
Cells.
 If you have a quantitative variable and a qualitative variable, you can calculate
descriptive statistics for cases grouped in different categories by selecting
Analyze > Reports > Case Summaries. Move the quantitative variable(s) into the
Variables list and the qualitative variable(s) into the Grouping Variable(s) list.
Click Statistics to select the summaries that you would like; the default is
Number of Cases, but other statistics such as the Mean and
Standard Deviation can also be selected. To avoid superfluous output uncheck
Display cases.
 To make a stemandleaf plot for a quantitative variable, select
Analyze > Descriptive Statistics > Explore. Move the variable into the
Dependent List box. You can alter the statistics that are calculated and the plots that
are constructed by clicking Statistics and Plots.
 To make a histogram for a quantitative variable, select
Graphs > Legacy Dialogs > Histogram. Move the variable into the Variable
box.
 To make a scatterplot with two quantitative variables, select
Graphs > Legacy Dialogs > Scatter/Dot. Choose Simple Scatter and move the
vertical axis variable into the Y Axis box and the horizontal axis variable into the
X Axis box.
 All possible scatterplots for more than two variables can be drawn simultaneously
(called a scatterplot matrix}) by selecting Graphs > Legacy Dialogs > Scatter/Dot,
then choosing Matrix Scatter and moving the variables into the Matrix Variables
list.
 You can mark or label cases in a scatterplot with different colors/symbols
according to categories in a qualitative variable by moving the variable into the
Set Markers by box in the Scatterplot dialog. To change the colors/symbols used,
edit the plot (doubleclick it in the Viewer Window) to bring up a
Chart Editor Window, select the symbol you want to change by clicking on it in the legend
at the right of the plot (the data points corresponding to this symbol should become highlighted
when you do this), and select Edit > Properties. Select the color/symbol you want and click
Apply to see the effect. click Close to return to the plot; close the plot to
return to the Viewer Window.
 You can identify individual cases in a scatterplot using labels by moving a
qualitative text variable into the Label Cases by box in the Scatterplot dialog.
This has no apparent effect on the plot when it is first drawn, but if you subsequently edit the
plot (doubleclick it in the Viewer Window) to bring up a Chart Editor Window, you
can then use the Point Identification tool (under Elements > Data Label Mode) to
click on a point and the label for that point will be displayed.
 To remove one of more observations from a dataset, select
Data > Select Cases and choose an appropriate selection criteria.
 To make a bar chart for cases in different categories, select
Graphs > Legacy Dialogs > Bar.
 For frequency bar charts of one qualitative variable, choose Simple and
move the variable into the Category Axis box.
 For frequency bar charts of two qualitative variables, choose Clustered
and move one variable into the Category Axis box and the other into the
Define Clusters by box.
 The bars can also represent various summary functions for a quantitative variable.
For example, to produce a bar chart of means, select Other statistic (e.g., mean) and move
the quantitative variable into the Variable box.
 To make boxplots for cases in different categories, select
Graphs > Legacy Dialogs > Boxplot.
 For just one qualitative variable, choose Simple and move the
qualitative variable into the Category Axis box. Move the quantitative variable into the
Variable box.
 For two qualitative variables, choose Clustered and move one qualitative
variable into the Category Axis box and the other into the Define Clusters by box.
Move the quantitative variable into the Variable box.
 To make a QQplot (also known as a normal probability plot) for a
quantitative variable, select Analyze > Descriptive Statistics > QQ Plots. Move the
variable into the Variables box and leave the Test Distribution as
Normal to assess normality of the variable. This procedure produces a regular QQplot
(described in Section 1.2 of Pardoe, 2012) as well as a "detrended" one.
 To compute a confidence interval for a univariate population mean, select
Analyze > Descriptive Statistics > Explore. Move the variable for which you want to
calculate the confidence interval into the Dependent List box and select
Statistics for Display. Then click the Statistics button to bring up
another dialog box in which you can specify the confidence level for the interval (among other
things). Clicking Continue will take you back to the previous dialog box, where you can now
click OK.
 To do a hypothesis test for a univariate population mean, select
Analyze > Compare Means > OneSample T Test. Move the variable for which you want to do
the test into the Test Variable(s) box and type the (null) hypothesized value into the
Test Value box. The pvalue calculated (displayed as "Sig.") is a twotailed pvalue; to
obtain a onetailed pvalue you will either need to divide this value by two or subtract it from one
and then divide by two (draw a picture to figure out which).
Simple linear regression
 To fit a simple linear regression model (i.e., find a least squares line),
select Analyze > Regression > Linear. Move the response variable into the
Dependent box and the predictor variable into the Independent(s) box. Just
click OK for now—the other items in the dialog box are addressed below. In the
output, ignore the column headed "Standardized Coefficients." In the rare circumstance that you
wish to fit a model without an intercept term (regression through the origin), click
Options before clicking OK and uncheck Include constant in equation.
 To add a regression line or least squares line to a scatterplot,
edit the plot (doubleclick it in the Viewer Window) to bring up a
Chart Editor Window and select Elements > Fit Line at Total. This brings up
another dialog in which you need to make sure Linear is selected under
Fit Method. Click Close to add the least squares line and return to the plot;
close the plot to return to the Viewer Window.
 To find 95% confidence intervals for the regression parameters in a simple
linear regression model, select Analyze > Regression > Linear. Move the response variable
into the Dependent box and the predictor variable into the Independent(s) box.
Before clicking OK, click the Statistics button and check
Confidence intervals (under Regression Coefficient) in the subsequent
Linear Regression: Statistics dialog box. Click Continue to return to the main
Linear Regression dialog box, and then click OK. The confidence intervals are
displayed as the final two columns of the "Coefficients" output. This applies more generally
to multiple linear regression also.

 To find a fitted value or predicted value of Y (the response
variable) at a particular value of X (the predictor variable), select
Analyze > Regression > Linear. Move the response variable into the Dependent box
and the predictor variable into the Independent(s) box. Before clicking OK, click
the Save button and check Unstandardized under Predicted Values in the
subsequent Linear Regression: Save dialog box. Click Continue to return to the
main Linear Regression dialog box, and then click OK. The fitted or predicted
values of Y at each of the Xvalues in the dataset are displayed in the column headed PRE_1
in the Data Editor Window (not in the Viewer Window). Each time you ask SPSS to
calculate fitted or predicted values like this it will add a new column to the dataset and
increment the end digit by one; for example, the second time you calculate fitted or predicted
values they will be called PRE_2.
 You can also obtain a fitted or predicted value of Y
at an Xvalue that is not in the dataset by doing the following. Before fitting the regression
model, add the Xvalue to the dataset in the Data Editor Window (go down to the bottom of
the spreadsheet, and type the Xvalue in the appropriate cell of the next blank row). Then fit the
regression model and follow the steps above. SPSS will ignore the Xvalue you typed when fitting the
model (since there is no corresponding Yvalue), so all the regression output (such as the estimated
regression parameters) will be the same. But SPSS will calculate a fitted or predicted value of Y
at this new Xvalue based on the results of the regression. Again, look for it in the dataset; it
will be displayed in the column headed PRE in the Data Editor Window (not in the
Viewer Window).
 This applies more generally to multiple linear regression also.

 To find a confidence interval for the mean of Y at a particular value of
X, select Analyze > Regression > Linear. Move the response variable into the
Dependent box and the predictor variable into the Independent(s) box. Before
clicking OK, click the Save button and check Mean (under
Prediction Intervals) in the subsequent Linear Regression: Save dialog box. Type
the value of the confidence level that you want in the Confidence Interval box (the default
is 95%), click Continue to return to the main Linear Regression dialog box, and
then click OK. The confidence intervals for the mean of Y at each of the Xvalues in the
dataset are displayed as two columns headed LMCI_1 and UMCI_1
in the Data Editor Window (not in the Viewer Window). The "LMCI" stands for "lower
mean confidence interval," while the "UMCI" stands for "upper mean confidence interval." Each time
you ask SPSS to calculate confidence intervals like this it will add new columns to the dataset and
increment the end digit by one; for example, the second time you calculate confidence intervals for
the mean of Y the end points will be called LMCI_2 and UMCI_2.
 You can also obtain a confidence interval for the mean of Y at an Xvalue that is
not in the dataset by doing the following. Before fitting the regression model, add the Xvalue to
the dataset in the Data Editor Window (go down to the bottom of the spreadsheet, and type
the Xvalue in the appropriate cell of the next blank row). Then fit the regression model and
follow the steps above. SPSS will ignore the Xvalue you typed when fitting the model (since there
is no corresponding Yvalue), so all the regression output (such as the estimated regression
parameters) will be the same. But SPSS will calculate a confidence interval for the mean of Y at
this new Xvalue based on the results of the regression. Again, look for it in the dataset; it will
be displayed in the two columns headed LMCI and UMCI in the Data Editor
Window (not in the Viewer Window).
 This applies more generally to multiple linear regression also.

 To find a prediction interval for an individual value of Y at a particular
value of X, select Analyze > Regression > Linear. Move the response variable into the
Dependent box and the predictor variable into the Independent(s) box. Before
clicking OK, click the Save button and check Individual (under
Prediction Intervals) in the subsequent Linear Regression: Save dialog box. Type
the value of the confidence level that you want in the Confidence Interval box (the default
is 95%), click Continue to return to the main Linear Regression dialog box, and
then click OK. The prediction intervals for an individual Yvalue at each of the Xvalues
in the dataset are displayed as two columns headed LICI_1 and UICI_1
in the Data Editor Window (not in the Viewer Window). The "LICI" stands for "lower
individual confidence interval," while the "UICI" stands for "upper individual confidence interval."
Each time you ask SPSS to calculate prediction intervals like this it will add new columns to the
dataset and increment the end digit by one; for example, the second time you calculate prediction
intervals for the mean of Y the end points will be called LICI_2 and UICI_2.
 You can also obtain a prediction interval for an individual Yvalue at an Xvalue
that is not in the dataset by doing the following. Before fitting the regression model, add the X
value to the dataset in the Data Editor Window (go down to the bottom of the spreadsheet,
and type the Xvalue in the appropriate cell of the next blank row). Then fit the regression model
and follow the steps above. SPSS will ignore the Xvalue you typed when fitting the model (since
there is no corresponding Yvalue), so all the regression output (such as the estimated regression
parameters) will be the same. But SPSS will calculate a prediction interval for an individual Y
value at this new Xvalue based on the results of the regression. Again, look for it in the dataset;
it will be displayed in the two columns headed LICI and UICI in the
Data Editor Window (not in the Viewer Window).
 This applies more generally to multiple linear regression also.
Multiple linear regression
 To fit a multiple linear regression model, select
Analyze > Regression > Linear. Move the response variable into the Dependent box and the predictor variables into the Independent(s) box. In the rare
circumstance that you wish to fit a model without an intercept term (regression through the origin),
click Options before clicking OK and uncheck
Include constant in equation.
 To add a quadratic regression line to a scatterplot, edit the plot
(doubleclick it in the Viewer Window) to bring up a Chart Editor Window and
select Elements > Fit Line at Total. This brings up another dialog in which you need to
check the Quadratic option under Fit Method. Click Apply and
Close to add the quadratic regression line and return to the plot; close the plot to return
to the Viewer Window.
 Categories of a qualitative variable can be thought of as defining subsets
of the sample. If there is also a quantitative response and a quantitative predictor variable in
the dataset, a regression model can be fit to the data to represent separate regression lines for
each subset. First use help #15 and #17 to make a scatterplot with the response variable on the
vertical axis, the quantitative predictor variable on the horizontal axis, and the cases marked with
different colors/symbols according to the categories in the qualitative predictor variable. To add
a regression line for each subset to this scatterplot, edit the plot (doubleclick it in the
Viewer Window) to bring up a Chart Editor Window and select
Elements > Fit Line at Subgroups. This brings up another dialog in which you need to make
sure Linear is selected under Fit Method. Click Close to add the least
squares lines for each subset of selected points and return to the plot. Close the plot to return
to the Viewer Window.
 To find the Fstatistic and associated pvalue for a nested model Ftest in
multiple linear regression, select Analyze > Regression > Linear. Move the response
variable into the Dependent box and the predictor variables in the reduced model
into the Independent(s) box. Click the Next button to the right of where it says
Block 1 of 1; it should now say Block 2 of 2 and the Independent(s) box
should have been cleared. Move the additional predictors in the complete model (i.e.,
the predictors whose usefulness you are assessing) into this Block 2 Independent(s) box.
You should now have the predictors that are in both the reduced and complete models in Block
1, and the predictors that are only in the complete model in Block 2. Then click
Statistics and check R squared change. Finally click Continue to return
to the Regression dialog and OK to obtain the results. The Fstatistic is in the
second row of the "Model Summary" in the column headed F Change, while the associated
pvalue is in the column headed Sig. (Ignore the numbers in the first rows of these
columns.)
 To save residuals in a multiple linear regression model, select
Analyze > Regression > Linear. Move the response variable into the Dependent box
and the predictor variables into the Independent(s) box. Before clicking OK,
click the Save button and check Unstandardized under Residuals in the
subsequent Linear Regression: Save dialog box. Click Continue to return to the
main Linear Regression dialog box, and then click OK. The residuals are saved as a
variable called RES_1 in the Data Editor Window; they can now be used just like
any other variable, for example, to construct residual plots. Each time you ask SPSS to save residuals like this it will add a new variable to the dataset and increment the end digit by one;
for example, the second time you save residuals they will be called RES_2. To save
what Pardoe (2012) calls standardized residuals, check Studentized under
Residuals in the Linear Regression: Save dialog box—they will be saved as a
variable called SRE in the Data Editor Window. To save what Pardoe (2012) calls
studentized residuals, check Studentized deleted under Residuals in the
Linear Regression: Save dialog box—they will be saved as a variable called
SDR in the Data Editor Window.
 To add a loess fitted line to a scatterplot (useful for checking the zero
mean regression assumption in a residual plot), edit the plot (doubleclick it in the
Viewer Window) to bring up a Chart Editor Window and select
Elements > Fit Line at Total. This brings up another dialog in which you need to check the
Loess option under Fit Method. The default value of 50 for
% of points to fit tends to be a little on the low side: I would change it to 75.
Click Apply and Close to add the loess fitted line and to return to the plot;
close the plot to return to the Viewer Window.
 To save leverages in a multiple linear regression model, select
Analyze > Regression > Linear. Move the response variable into the Dependent box
and the predictor variables into the Independent(s) box. Before clicking OK,
click the Save button and check Leverage values under Distances in the
subsequent Linear Regression: Save dialog box. Click Continue to return to the
main Linear Regression dialog box, and then click OK. This results in "centered"
leverages being saved as a variable called LEV_1 in the Data Editor Window; they
can now be used just like any other variable, for example, to construct scatterplots. Each time you
save leverages like this, SPSS will add a new variable to the dataset and increment the end digit by
one; for example, the second set of leverages will be called LEV_2. Centered leverage =
ordinary leverage − 1/n, where ordinary leverage is defined in Section 5.1.2 of Pardoe (2012) and n is the sample size.
 To save Cook's distances in a multiple linear regression model, select
Analyze > Regression > Linear. Move the response variable into the Dependent box
and the predictor variables into the Independent(s) box. Before clicking OK,
click the Save button and check Cook's under Distances in the subsequent Linear Regression: Save dialog box. Click Continue to return to the main
Linear Regression dialog box, and then click OK. Cook's distances are saved as a
variable called COO_1 in the Data Editor Window; they can now be used just like
any other variable, for example, to construct scatterplots. Each time you save Cook's distances like
this, SPSS will add a new variable to the dataset and increment the end digit by one; for example,
the second set of Cook's distances will be called COO_2.
 To create some residual plots automatically in a multiple linear regression
model, select Analyze > Regression > Linear. Move the response variable into the
Dependent box and the predictor variables into the Independent(s) box. Before
clicking OK, click the Plots button and move *SRESID into the Y
box and *ZPRED into the X box to create a scatterplot of the standardized
residuals on the vertical axis versus the standardized predicted values on the horizontal axis.
Click Continue to return to the main Linear Regression dialog box, and then hit
OK. To create residual plots manually, first create studentized residuals (see help #35),
and then construct scatterplots with these studentized residuals on the vertical axis.
 To create a correlation matrix of quantitative variables (useful for
checking potential multicollinearity problems), select
Analyze > Correlate > Bivariate. Move the variables into the Variables box and
click OK.
 To find variance inflation factors in multiple linear regression, select
Analyze > Regression > Linear. Move the response variable into the Dependent box
and the predictor variables into the Independent(s) box. Before clicking OK,
click Statistics and check Collinearity diagnostics. Click Continue to
return to the Regression dialog and OK to obtain the results. The variance
inflation factors are in the last column of the "Coefficients" output under "VIF."
 To draw a predictor effect plot for graphically displaying the effects of
transformed quantitative predictors and/or interactions between quantitative and qualitative
predictors in multiple linear regression, first create a variable representing the effect, say,
"X1effect" (see computer help #6). Then select
Graphs > Legacy Dialogs > Interactive > Line. Move the "X1effect" variable into the
vertical axis box and X1 into the horizontal axis box.
 If the "X1effect" variable just involves X1 (e.g., 1 + 3X1 + 4X1^{2}),
you can click OK at this point.
 If the "X1effect" variable also involves a qualitative variable (e.g.,
1 − 2X1 + 3D2X1, where D2 is an indicator variable), you should move the qualitative variable
into the Legend Variables Color or Style box before clicking OK.
See Section 5.5 in Pardoe (2012) for an example.
Last updated: June, 2012
© 2012, Iain Pardoe