SPSS instructions

These instructions accompany Applied Regression Modeling by Iain Pardoe, 2nd edition published by Wiley in 2012. The numbered items cross-reference with the "computer help" references in the book. These instructions are based on SPSS 20 for Windows, but they (or something similar) should also work for other versions. Find instructions for other statistical software packages here.

Getting started and summarizing univariate data

  1. If desired, change SPSS's default options by selecting Edit > Options. For example, to display variable names (in alphabetical order) rather than labels in dialog boxes, click the General tab; in the Variable Lists group select Display names and select Alphabetical. To show variable names rather than labels in output tables, click the Output Labels tab; under Pivot Table Labeling change Variables in labels shown as to Names. To display small numbers in tables without using scientific notation (which can make reading the numbers more difficult), click the General tab; under Output check No scientific notation for small numbers in tables.
  2. To open a SPSS data file, select File > Open > Data.
  3. To recall a previously used dialog box, click the Dialog Recall tool (fourth button from the left in the Data Editor Window, sixth button from the left in the Viewer Window).
  4. Output can be edited in the Viewer Window. Individual pieces of output (including tables and graphs) can be selected, edited, moved, deleted, and so on using both the Outline Pane (on the left) and the Display Pane (on the right). Text and headings can be entered using the Insert menu. Alternatively, copy and paste pieces of output from SPSS to a word processor like OpenOffice Writer or Microsoft Word.
  5. You can access help by selecting Help > Topics. For example, to find out about "boxplots" click the Index tab, type boxplots in the first box, and select the index entry you want in the second box.
  6. To transform data or compute a new variable, select Transform > Compute Variable. Type a name (with no spaces) for the new variable in the Target Variable box, and type a mathematical expression for the variable in the Numeric Expression box. Current variables in the dataset can be moved into the Numeric Expression box, while the keypad and list of functions can be used to create the expression. Examples are LN(X) for the natural logarithm of X and X**2 for X2. Click OK to create the new variable, which will be added to the dataset (check it looks correct in the Data Editor Window); it can now be used just like any other variable. If you get the error message "expression ends unexpectedly," this means there is a syntax error in your Numeric Expression—a common mistake is to forget the multiplication symbol (*) between a number and a variable (e.g., 2*X represents 2X).
  7. To create indicator (dummy) variables from a qualitative variable, select Transform > Recode into Different Variables. Move the qualitative variable into the Input Variable -> Output Variable box, type a name for the first indicator variable in the Output Variable Name box, and press Change (the name should replace the question mark in the Input Variable -> Output Variable box). Next, press Old and New Values, type the appropriate category name/number into the Old Value box, type 1 into the New Value box, and press Add. Then select All other values, type 0 into the New Value box, and press Add. Click Continue to return to the previous dialog box, and click OK (check that the correct indicator variable has been added to your spreadsheet in the Data Editor Window). Repeat for other indicator variables (if necessary).
  8. Calculate descriptive statistics for quantitative variables by selecting Analyze > Descriptive Statistics > Frequencies. Move the variable(s) into the Variable(s) list. Click Statistics to select the summaries, such as the Mean, that you would like. To avoid superfluous output uncheck Display frequency tables.
  9. Create contingency tables or cross-tabulations for qualitative variables by selecting Analyze > Descriptive Statistics > Crosstabs. Move one qualitative variable into the Row(s) list and another into the Column(s) list. Cell percentages (within rows, columns, or the whole table) can be calculated by clicking Cells.
  10. If you have a quantitative variable and a qualitative variable, you can calculate descriptive statistics for cases grouped in different categories by selecting Analyze > Reports > Case Summaries. Move the quantitative variable(s) into the Variables list and the qualitative variable(s) into the Grouping Variable(s) list. Click Statistics to select the summaries that you would like; the default is Number of Cases, but other statistics such as the Mean and Standard Deviation can also be selected. To avoid superfluous output uncheck Display cases.
  11. To make a stem-and-leaf plot for a quantitative variable, select Analyze > Descriptive Statistics > Explore. Move the variable into the Dependent List box. You can alter the statistics that are calculated and the plots that are constructed by clicking Statistics and Plots.
  12. To make a histogram for a quantitative variable, select Graphs > Legacy Dialogs > Histogram. Move the variable into the Variable box.
  13. To make a scatterplot with two quantitative variables, select Graphs > Legacy Dialogs > Scatter/Dot. Choose Simple Scatter and move the vertical axis variable into the Y Axis box and the horizontal axis variable into the X Axis box.
  14. All possible scatterplots for more than two variables can be drawn simultaneously (called a scatterplot matrix}) by selecting Graphs > Legacy Dialogs > Scatter/Dot, then choosing Matrix Scatter and moving the variables into the Matrix Variables list.
  15. You can mark or label cases in a scatterplot with different colors/symbols according to categories in a qualitative variable by moving the variable into the Set Markers by box in the Scatterplot dialog. To change the colors/symbols used, edit the plot (double-click it in the Viewer Window) to bring up a Chart Editor Window, select the symbol you want to change by clicking on it in the legend at the right of the plot (the data points corresponding to this symbol should become highlighted when you do this), and select Edit > Properties. Select the color/symbol you want and click Apply to see the effect. click Close to return to the plot; close the plot to return to the Viewer Window.
  16. You can identify individual cases in a scatterplot using labels by moving a qualitative text variable into the Label Cases by box in the Scatterplot dialog. This has no apparent effect on the plot when it is first drawn, but if you subsequently edit the plot (double-click it in the Viewer Window) to bring up a Chart Editor Window, you can then use the Point Identification tool (under Elements > Data Label Mode) to click on a point and the label for that point will be displayed.
  17. To remove one of more observations from a dataset, select Data > Select Cases and choose an appropriate selection criteria.
  18. To make a bar chart for cases in different categories, select Graphs > Legacy Dialogs > Bar.
  19. To make boxplots for cases in different categories, select Graphs > Legacy Dialogs > Boxplot.
  20. To make a QQ-plot (also known as a normal probability plot) for a quantitative variable, select Analyze > Descriptive Statistics > Q-Q Plots. Move the variable into the Variables box and leave the Test Distribution as Normal to assess normality of the variable. This procedure produces a regular QQ-plot (described in Section 1.2 of Pardoe, 2012) as well as a "detrended" one.
  21. To compute a confidence interval for a univariate population mean, select Analyze > Descriptive Statistics > Explore. Move the variable for which you want to calculate the confidence interval into the Dependent List box and select Statistics for Display. Then click the Statistics button to bring up another dialog box in which you can specify the confidence level for the interval (among other things). Clicking Continue will take you back to the previous dialog box, where you can now click OK.
  22. To do a hypothesis test for a univariate population mean, select Analyze > Compare Means > One-Sample T Test. Move the variable for which you want to do the test into the Test Variable(s) box and type the (null) hypothesized value into the Test Value box. The p-value calculated (displayed as "Sig.") is a two-tailed p-value; to obtain a one-tailed p-value you will either need to divide this value by two or subtract it from one and then divide by two (draw a picture to figure out which).

Simple linear regression

  1. To fit a simple linear regression model (i.e., find a least squares line), select Analyze > Regression > Linear. Move the response variable into the Dependent box and the predictor variable into the Independent(s) box. Just click OK for now—the other items in the dialog box are addressed below. In the output, ignore the column headed "Standardized Coefficients." In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), click Options before clicking OK and uncheck Include constant in equation.
  2. To add a regression line or least squares line to a scatterplot, edit the plot (double-click it in the Viewer Window) to bring up a Chart Editor Window and select Elements > Fit Line at Total. This brings up another dialog in which you need to make sure Linear is selected under Fit Method. Click Close to add the least squares line and return to the plot; close the plot to return to the Viewer Window.
  3. To find 95% confidence intervals for the regression parameters in a simple linear regression model, select Analyze > Regression > Linear. Move the response variable into the Dependent box and the predictor variable into the Independent(s) box. Before clicking OK, click the Statistics button and check Confidence intervals (under Regression Coefficient) in the subsequent Linear Regression: Statistics dialog box. Click Continue to return to the main Linear Regression dialog box, and then click OK. The confidence intervals are displayed as the final two columns of the "Coefficients" output. This applies more generally to multiple linear regression also.

Multiple linear regression

  1. To fit a multiple linear regression model, select Analyze > Regression > Linear. Move the response variable into the Dependent box and the predictor variables into the Independent(s) box. In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), click Options before clicking OK and uncheck Include constant in equation.
  2. To add a quadratic regression line to a scatterplot, edit the plot (double-click it in the Viewer Window) to bring up a Chart Editor Window and select Elements > Fit Line at Total. This brings up another dialog in which you need to check the Quadratic option under Fit Method. Click Apply and Close to add the quadratic regression line and return to the plot; close the plot to return to the Viewer Window.
  3. Categories of a qualitative variable can be thought of as defining subsets of the sample. If there is also a quantitative response and a quantitative predictor variable in the dataset, a regression model can be fit to the data to represent separate regression lines for each subset. First use help #15 and #17 to make a scatterplot with the response variable on the vertical axis, the quantitative predictor variable on the horizontal axis, and the cases marked with different colors/symbols according to the categories in the qualitative predictor variable. To add a regression line for each subset to this scatterplot, edit the plot (double-click it in the Viewer Window) to bring up a Chart Editor Window and select Elements > Fit Line at Subgroups. This brings up another dialog in which you need to make sure Linear is selected under Fit Method. Click Close to add the least squares lines for each subset of selected points and return to the plot. Close the plot to return to the Viewer Window.
  4. To find the F-statistic and associated p-value for a nested model F-test in multiple linear regression, select Analyze > Regression > Linear. Move the response variable into the Dependent box and the predictor variables in the reduced model into the Independent(s) box. Click the Next button to the right of where it says Block 1 of 1; it should now say Block 2 of 2 and the Independent(s) box should have been cleared. Move the additional predictors in the complete model (i.e., the predictors whose usefulness you are assessing) into this Block 2 Independent(s) box. You should now have the predictors that are in both the reduced and complete models in Block 1, and the predictors that are only in the complete model in Block 2. Then click Statistics and check R squared change. Finally click Continue to return to the Regression dialog and OK to obtain the results. The F-statistic is in the second row of the "Model Summary" in the column headed F Change, while the associated p-value is in the column headed Sig. (Ignore the numbers in the first rows of these columns.)
  5. To save residuals in a multiple linear regression model, select Analyze > Regression > Linear. Move the response variable into the Dependent box and the predictor variables into the Independent(s) box. Before clicking OK, click the Save button and check Unstandardized under Residuals in the subsequent Linear Regression: Save dialog box. Click Continue to return to the main Linear Regression dialog box, and then click OK. The residuals are saved as a variable called RES_1 in the Data Editor Window; they can now be used just like any other variable, for example, to construct residual plots. Each time you ask SPSS to save residuals like this it will add a new variable to the dataset and increment the end digit by one; for example, the second time you save residuals they will be called RES_2. To save what Pardoe (2012) calls standardized residuals, check Studentized under Residuals in the Linear Regression: Save dialog box—they will be saved as a variable called SRE in the Data Editor Window. To save what Pardoe (2012) calls studentized residuals, check Studentized deleted under Residuals in the Linear Regression: Save dialog box—they will be saved as a variable called SDR in the Data Editor Window.
  6. To add a loess fitted line to a scatterplot (useful for checking the zero mean regression assumption in a residual plot), edit the plot (double-click it in the Viewer Window) to bring up a Chart Editor Window and select Elements > Fit Line at Total. This brings up another dialog in which you need to check the Loess option under Fit Method. The default value of 50 for % of points to fit tends to be a little on the low side: I would change it to 75. Click Apply and Close to add the loess fitted line and to return to the plot; close the plot to return to the Viewer Window.
  7. To save leverages in a multiple linear regression model, select Analyze > Regression > Linear. Move the response variable into the Dependent box and the predictor variables into the Independent(s) box. Before clicking OK, click the Save button and check Leverage values under Distances in the subsequent Linear Regression: Save dialog box. Click Continue to return to the main Linear Regression dialog box, and then click OK. This results in "centered" leverages being saved as a variable called LEV_1 in the Data Editor Window; they can now be used just like any other variable, for example, to construct scatterplots. Each time you save leverages like this, SPSS will add a new variable to the dataset and increment the end digit by one; for example, the second set of leverages will be called LEV_2. Centered leverage = ordinary leverage − 1/n, where ordinary leverage is defined in Section 5.1.2 of Pardoe (2012) and n is the sample size.
  8. To save Cook's distances in a multiple linear regression model, select Analyze > Regression > Linear. Move the response variable into the Dependent box and the predictor variables into the Independent(s) box. Before clicking OK, click the Save button and check Cook's under Distances in the subsequent Linear Regression: Save dialog box. Click Continue to return to the main Linear Regression dialog box, and then click OK. Cook's distances are saved as a variable called COO_1 in the Data Editor Window; they can now be used just like any other variable, for example, to construct scatterplots. Each time you save Cook's distances like this, SPSS will add a new variable to the dataset and increment the end digit by one; for example, the second set of Cook's distances will be called COO_2.
  9. To create some residual plots automatically in a multiple linear regression model, select Analyze > Regression > Linear. Move the response variable into the Dependent box and the predictor variables into the Independent(s) box. Before clicking OK, click the Plots button and move *SRESID into the Y box and *ZPRED into the X box to create a scatterplot of the standardized residuals on the vertical axis versus the standardized predicted values on the horizontal axis. Click Continue to return to the main Linear Regression dialog box, and then hit OK. To create residual plots manually, first create studentized residuals (see help #35), and then construct scatterplots with these studentized residuals on the vertical axis.
  10. To create a correlation matrix of quantitative variables (useful for checking potential multicollinearity problems), select Analyze > Correlate > Bivariate. Move the variables into the Variables box and click OK.
  11. To find variance inflation factors in multiple linear regression, select Analyze > Regression > Linear. Move the response variable into the Dependent box and the predictor variables into the Independent(s) box. Before clicking OK, click Statistics and check Collinearity diagnostics. Click Continue to return to the Regression dialog and OK to obtain the results. The variance inflation factors are in the last column of the "Coefficients" output under "VIF."
  12. To draw a predictor effect plot for graphically displaying the effects of transformed quantitative predictors and/or interactions between quantitative and qualitative predictors in multiple linear regression, first create a variable representing the effect, say, "X1effect" (see computer help #6). Then select Graphs > Legacy Dialogs > Interactive > Line. Move the "X1effect" variable into the vertical axis box and X1 into the horizontal axis box. See Section 5.5 in Pardoe (2012) for an example.

Last updated: June, 2012

© 2012, Iain Pardoe