Excel instructions

These instructions accompany Applied Regression Modeling by Iain Pardoe, 2nd edition published by Wiley in 2012. The numbered items cross-reference with the "computer help" references in the book. These instructions are based on Microsoft Office Excel 2003 for Windows (with the Analysis ToolPak activated), but they (or something similar) should also work for other versions. These instructions contains less material than the instructions for statistical software packages (available here) because it is not possible to easily carry out many of the regression techniques using Excel (without a statistics add-on module such as StatTools or Lumenaut).

Getting started and summarizing univariate data

  1. If desired, change Excel's default options by selecting Tools > Options.
  2. To open an Excel data file, select File > Open. You can also use File > Import to open text data files.
  3. Excel does not appear to offer a way to edit the last dialog box.
  4. Output can be copied and pasted from Excel to a word processor like OpenOffice Writer or Microsoft Word. Graphs can also easily be copied and pasted to other applications.
  5. You can access help by selecting Help > Microsoft Excel Help. For example, to find out about "scatterplots," type scatter plot in the search box.
  6. To transform data or compute a new variable, type, for example, =LN(X) for the natural logarithm of X and =X^2 for X2.
  7. To create indicator (dummy) variables from a qualitative variable, type, for example, =IF(X="level", 1, 0), where X is the qualitative variable and "level" is the name of one of the categories in X. Repeat for other indicator variables (if necessary).
  8. Calculate descriptive statistics for quantitative variables by selecting Tools > Data Analysis > Descriptive Statistics. Select the Input Range to include the variable(s) of interest, check Labels in first row if appropriate, and check Summary statistics.
  9. Create contingency tables or cross-tabulations for qualitative variables by selecting Data > PivotTable and PivotChart Report. Select Microsoft Office Excel list or database as the data to be analyzed and PivotTable as the report to be created. Next, select an appropriate data range and put the PivotTable report in a new worksheet. Drag one qualitative variable to the Column Fields space, another qualitative variable to the Row Fields space, and drag some other convenient variable to the Data Items space. The resulting table should show sums of the variable in theData Items for different combinations of the qualitative variable categories. To change the sums to frequencies, double-click on cell A3 and change Sum to Count. To calculate row and column percentages, click theOptions button.
  10. If you have quantitative variables and qualitative variables, you can calculate descriptive statistics for cases grouped in different categories by creating a PivotTable (see computer help #11) and double-clicking on cell A3 to select different summary functions.
  11. Excel does not appear to offer an automatic way to make a stem-and-leaf plot for a quantitative variable.
  12. To make a histogram for a quantitative variable, select Tools > Data Analysis > Histogram. Select the Input Range to include the variable of interest, check Labels if appropriate, and check Chart Output.
  13. To make a scatterplot with two quantitative variables, select Insert > Chart > XY (Scatter). In Step 2 of the Chart Wizard click on the Series tab, select the appropriate data ranges for the X Values, Y Values, and Name boxes, and click Finish.
  14. Excel does not appear to offer an automatic way to create a scatterplot matrix.
  15. Excel does not appear to offer an automatic way to create a scatterplot with different colors/symbols marking the categories in a qualitative variable.
  16. You can identify individual cases in a scatterplot by hovering over them.
  17. To remove one of more observations from a dataset, click on the row number, right-click, and select Delete.
  18. To make a bar chart for cases in different categories, first create a PivotTable (see computer help #11) of cell frequencies. Then select Insert > Chart to create a bar chart.
  19. Excel does not appear to offer an automatic way to create boxplots.
  20. Excel can make a QQ-plot for a quantitative variable automatically, but only though the Regression tool. For example, select Tools > Data Analysis > Regression, then select the Input Y Range to include the response variable, select the Input X Range to include the predictor variable(s), and check Labels if appropriate. The predictor variables should be in adjacent columns in the spreadsheet for this to work. Finally, check Normal Probability Plots before clicking OK to produce a QQ-plot for the response variable in the regression.
  21. To compute a confidence interval for a univariate population mean, select Tools > Data Analysis > Descriptive Statistics. Select the Input Range to include the variable of interest, check Labels in first row if appropriate, check Summary statistics, check Confidence Level for Mean, and type the confidence level into the box. The resulting Confidence Level value in the output represents the "uncertainty" in the intervals. In other words, the interval goes from the sample mean minus this uncertainty to the sample mean plus this uncertainty.
  22. Excel does not appear to offer an automatic way to do a hypothesis test for a univariate population mean. It is possible to do the test by hand calculation using Excel descriptive statistics output and appropriate percentiles from a t-distribution.

Simple linear regression

  1. To fit a simple linear regression model (i.e., find a least squares line), select Tools > Data Analysis > Regression. Select the Input Y Range to include the response variable, select the Input X Range to include the predictor variable, and check Labels if appropriate. Just click OK for now—the other items in the dialog box are addressed below. In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), [?].
  2. To add a regression line or least squares line to a scatterplot, select the plot by clicking on it and select Chart > Add Trendline. This brings up another dialog in which you need to make sureLinear is selected under Trend/Regression type. Click OK to add the least squares line to the plot.
  3. In fitting a simple linear regression model (see computer help #25), Excel automatically finds 95% confidence intervals for the regression parameters. This applies more generally to multiple linear regression also.
  4. To find a fitted value or predicted value of Y (the response variable) at a particular value of X (the predictor variable) in a linear regression model, [?]. This applies more generally to multiple linear regression also.
  5. Excel does not appear to offer an automatic way to find a confidence interval for the mean of Y at a particular value of X in a simple linear regression model. It is possible to calculate such an interval by hand using Excel regression output and an appropriate percentile from a t-distribution. This applies more generally to multiple linear regression also.
  6. Excel does not appear to offer an automatic way to find a prediction interval for an individual Y-value at a particular X-value in a simple linear regression model. It is possible to calculate such an interval by hand using Excel regression output and an appropriate percentile from a t-distribution. This applies more generally to multiple linear regression also.

Multiple linear regression

  1. To fit a multiple linear regression model, select Tools > Data Analysis > Regression. Select the Input Y Range to include the response variable, select the Input X Range to include the predictor variables, and check Labels if appropriate. The predictor variables should be in adjacent columns in the spreadsheet for this to work. In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), [?].
  2. To add a quadratic regression line to a scatterplot, select the plot by clicking on it, and select Chart > Add Trendline. This brings up another dialog in which you need to make sure that Polynomial with Order 2 is selected under Trend/Regression type. Click OK to add the quadratic regression line to the plot.
  3. Excel does not appear to offer an automatic way to create a scatterplot with separate regression lines for subsets of the sample.
  4. Excel does not appear to offer an automatic way to to find the F-statistic and associated p-value for a nested model F-test in multiple linear regression. It is possible to calculate these quantities by hand using Excel regression output and appropriate percentiles from a F-distribution.
  5. To save residuals in a multiple linear regression model, fit the model using computer help #31 and check Residuals before clicking OK; they can now be used just like any other variable, for example, to construct residual plots. To save crude "standardized residuals" (ordinary residuals divided by their standard deviation), check Standardized Residuals. Excel does not appear to offer an automatic way to save what Pardoe (2012) calls standardized residuals or studentized residuals.
  6. Excel does not appear to offer an automatic way to add a loess fitted line to a scatterplot.
  7. Excel does not appear to offer an automatic way to save leverages in a multiple linear regression model.
  8. Excel does not appear to offer an automatic way to save Cook's distances in a multiple linear regression model.
  9. To create some residual plots with each predictor variable on the horizontal axis automatically in a multiple linear regression model, fit the model using computer help #31 and check Residual Plots before clicking OK. To create residual plots manually, first create "standardized residuals" (see computer help #35), and then construct scatterplots with these "standardized residuals" on the vertical axis. In particular, you should plot them against the Excel-provided fitted (predicted) values for the regression.
  10. To create a correlation matrix of quantitative variables (useful for checking potential multicollinearity problems), select Tools > Data Analysis > Correlation. Select the Input Range to include the variables of interest and check Labels in First Row if appropriate. The variables should be in adjacent columns in the spreadsheet.
  11. Excel does not appear to offer an automatic way to to find variance inflation factors in multiple linear regression.
  12. To draw a predictor effect plot for graphically displaying the effects of transformed quantitative predictors and/or interactions between quantitative and qualitative predictors in multiple linear regression, first create a variable representing the effect, say, "X1effect" (see computer help #6). See Section 5.5 in Pardoe (2012) for an example.

Last updated: June, 2012

© 2012, Iain Pardoe