Excel instructions
These instructions accompany Applied Regression Modeling by Iain Pardoe, 2nd edition
published by Wiley in 2012. The numbered items crossreference with the "computer help" references
in the book. These instructions are based on Microsoft Office Excel 2003 for Windows (with the
Analysis ToolPak activated), but they (or something similar) should also work for other
versions. These instructions contains less material than the instructions for statistical software
packages (available here) because it is not possible to
easily carry out many of the regression techniques using Excel (without a statistics addon module
such as StatTools or Lumenaut).
Getting started and summarizing univariate data
 If desired, change Excel's default options by selecting
Tools > Options.
 To open an Excel data file, select File > Open. You can also use
File > Import to open text data files.
 Excel does not appear to offer a way to edit the last dialog box.
 Output can be copied and pasted from Excel to a word processor like OpenOffice Writer or Microsoft Word. Graphs can also easily be copied and pasted to other
applications.
 You can access help by selecting Help > Microsoft Excel Help. For
example, to find out about "scatterplots," type scatter plot in the search box.
 To transform data or compute a new variable, type, for example,
=LN(X) for the natural logarithm of X and =X^2 for X^{2}.
 To create indicator (dummy) variables from a qualitative variable, type,
for example, =IF(X="level", 1, 0), where X is the qualitative variable and
"level" is the name of one of the categories in X. Repeat for other indicator
variables (if necessary).

 To find a percentile (critical value) for a tdistribution, type
=TINV(p,df), where p is the twotail significance level (twotail area) and df is the degrees of freedom. For example, =TINV(0.05,29) returns the 97.5th percentile of the tdistribution with 29 degrees of freedom (2.045), which is the critical value for a twotail test with a 5% significance level. By contrast, =TINV(0.1,29) returns the 95th percentile of the tdistribution with 29 degrees of freedom (1.699), which is the critical value for an uppertail test with a 5% significance level.
 To find a percentile (critical value) for an Fdistribution, type
=FINV(p, df1, df2), where p is the significance level (uppertail
area), df1 is the numerator degrees of freedom, and df2 is the denominator degrees
of freedom. For example, =FINV(0.05, 2, 3) returns the 95th percentile of the
Fdistribution with 2 numerator degrees of freedom and 3 denominator degrees of freedom
(9.552).
 To find a percentile (critical value) for a chisquared distribution, type
=CHIINV(p, df), where p is the significance level (uppertail area) and
df is the degrees of freedom. For example, =CHIINV(0.05, 2) returns the 95th
percentile of the chisquared distribution with 2 degrees of freedom (5.991).

 To find an uppertail area (uppertail pvalue) for a tdistribution, type
=TDIST(t, df, 1), where t is the absolute value of the tstatistic and
df is the degrees of freedom. For example, =TDIST(2.40, 29, 1) returns the
uppertail area for a tstatistic of 2.40 from the tdistribution with 29 degrees of freedom
(0.012), which is the pvalue for an uppertail test. By contrast,
=TDIST(2.40, 29, 2) returns the twotail area for a tstatistic of 2.40 from the
tdistribution with 29 degrees of freedom (0.023), which is the pvalue for a twotail test.
 To find an uppertail area (pvalue) for an Fdistribution, type
=FDIST(f, df1, df2), where f is the value of the Fstatistic,
df1 is the numerator degrees of freedom, and df2 is the denominator degrees of
freedom. For example, =FDIST(51.4, 2, 3) returns the uppertail area (pvalue)
for an Fstatistic of 51.4 for the Fdistribution with 2 numerator degrees of freedom and 3
denominator degrees of freedom (0.005).
 To find an uppertail area (pvalue) for a chisquared distribution, type
=CHIDIST(chisq, df), where chisq is the value of the chisquared
statistic and df is the degrees of freedom. For example,
=CHIDIST(0.38, 2, lower.tail=F) returns the uppertail area (pvalue)
for a chisquared statistic of 0.38 for the chisquared distribution with 2 degrees of freedom
(0.827).
 Calculate descriptive statistics for quantitative variables by selecting
Tools > Data Analysis > Descriptive Statistics. Select the Input Range to include
the variable(s) of interest, check Labels in first row if appropriate, and check
Summary statistics.
 Create contingency tables or crosstabulations for qualitative
variables by selecting Data > PivotTable and PivotChart Report. Select
Microsoft Office Excel list or database as the data to be analyzed and PivotTable
as the report to be created. Next, select an appropriate data range and put the
PivotTable report in a new worksheet. Drag one qualitative variable to the
Column Fields space, another qualitative variable to the Row Fields space, and
drag some other convenient variable to the Data Items space. The resulting table should
show sums of the variable in theData Items for different combinations of the qualitative
variable categories. To change the sums to frequencies, doubleclick on cell A3 and change
Sum to Count. To calculate row and column percentages, click theOptions button.
 If you have quantitative variables and qualitative variables, you can calculate
descriptive statistics for cases grouped in different categories by creating a
PivotTable (see computer help #11) and doubleclicking on cell A3 to select different
summary functions.
 Excel does not appear to offer an automatic way to make a stemandleaf
plot for a quantitative variable.
 To make a histogram for a quantitative variable, select
Tools > Data Analysis > Histogram. Select the Input Range to include the variable
of interest, check Labels if appropriate, and check Chart Output.
 To make a scatterplot with two quantitative variables, select
Insert > Chart > XY (Scatter). In Step 2 of the Chart Wizard click on
the Series tab, select the appropriate data ranges for the X Values,
Y Values, and Name boxes, and click Finish.
 Excel does not appear to offer an automatic way to create a scatterplot matrix.
 Excel does not appear to offer an automatic way to create a scatterplot with different colors/symbols marking the categories in a qualitative variable.
 You can identify individual cases in a scatterplot by hovering over
them.
 To remove one of more observations from a dataset, click on the row
number, rightclick, and select Delete.
 To make a bar chart for cases in different categories, first create a
PivotTable (see computer help #11) of cell frequencies. Then select
Insert > Chart to create a bar chart.
 This will produce a frequency bar chart of the qualitative variable. You may
need to subsequently click the Chart Wizard tool to change the chart type (e.g., from
stacked bars to clustered bars).
 The bars can also represent various summary functions for a quantitative variable.
For example, doubleclick on the cell that says Count of ... in the PivotChart
worksheet and change it to Average to make the bar chart represent Means.
 Excel does not appear to offer an automatic way to create boxplots.
 Excel can make a QQplot for a quantitative variable automatically, but
only though the Regression tool. For example, select
Tools > Data Analysis > Regression, then select the Input Y Range to include the
response variable, select the Input X Range to include the predictor variable(s), and check
Labels if appropriate. The predictor variables should be in adjacent columns in the
spreadsheet for this to work. Finally, check Normal Probability Plots before clicking
OK to produce a QQplot for the response variable in the regression.
 To compute a confidence interval for a univariate population mean, select
Tools > Data Analysis > Descriptive Statistics. Select the Input Range to include
the variable of interest, check Labels in first row if appropriate, check
Summary statistics, check Confidence Level for Mean, and type the confidence level
into the box. The resulting Confidence Level value in the output represents the
"uncertainty" in the intervals. In other words, the interval goes from the sample mean minus this
uncertainty to the sample mean plus this uncertainty.
 Excel does not appear to offer an automatic way to do a hypothesis test
for a univariate population mean. It is possible to do the test by hand calculation using Excel
descriptive statistics output and appropriate percentiles from a tdistribution.
Simple linear regression
 To fit a simple linear regression model (i.e., find a least squares line),
select Tools > Data Analysis > Regression. Select the Input Y Range to include
the response variable, select the Input X Range to include the predictor variable, and
check Labels if appropriate. Just click OK for now—the other items in the
dialog box are addressed below. In the rare circumstance that you wish to fit a model without an
intercept term (regression through the origin), [?].
 To add a regression line or least squares line to a scatterplot,
select the plot by clicking on it and select Chart > Add Trendline. This brings up another
dialog in which you need to make sureLinear is selected under
Trend/Regression type. Click OK to add the least squares line to the plot.
 In fitting a simple linear regression model (see computer help #25), Excel
automatically finds 95% confidence intervals for the regression parameters. This
applies more generally to multiple linear regression also.
 To find a fitted value or predicted value of Y (the response
variable) at a particular value of X (the predictor variable) in a linear regression model, [?].
This applies more generally to multiple linear regression also.
 Excel does not appear to offer an automatic way to find a
confidence interval for the mean of Y at a particular value of X in a simple linear
regression model. It is possible to calculate such an interval by hand using Excel
regression output and an appropriate percentile from a tdistribution. This applies more
generally to multiple linear regression also.
 Excel does not appear to offer an automatic way to find a prediction
interval for an individual Yvalue at a particular Xvalue in a simple linear regression model.
It is possible to calculate such an interval by hand using Excel regression output and an
appropriate percentile from a tdistribution. This applies more generally to multiple linear
regression also.
Multiple linear regression
 To fit a multiple linear regression model, select
Tools > Data Analysis > Regression. Select the Input Y Range to include the
response variable, select the Input X Range to include the predictor variables, and check
Labels if appropriate. The predictor variables should be in adjacent columns in the
spreadsheet for this to work. In the rare circumstance that you wish to fit a model without an
intercept term (regression through the origin), [?].
 To add a quadratic regression line to a scatterplot, select the plot by
clicking on it, and select Chart > Add Trendline. This brings up another dialog in which
you need to make sure that Polynomial with Order 2 is selected under
Trend/Regression type. Click OK to add the quadratic regression line to the
plot.
 Excel does not appear to offer an automatic way to create a scatterplot with separate regression lines for subsets of the sample.
 Excel does not appear to offer an automatic way to to find the Fstatistic and
associated pvalue for a nested model Ftest in multiple linear regression. It is possible
to calculate these quantities by hand using Excel regression output and appropriate percentiles
from a Fdistribution.
 To save residuals in a multiple linear regression model, fit the model using computer help #31 and check Residuals before clicking OK; they can now
be used just like any other variable, for example, to construct residual plots. To save crude
"standardized residuals" (ordinary residuals divided by their standard deviation), check
Standardized Residuals. Excel does not appear to offer an automatic way to save what
Pardoe (2012) calls standardized residuals or studentized residuals.
 Excel does not appear to offer an automatic way to add a loess fitted line
to a scatterplot.
 Excel does not appear to offer an automatic way to save leverages in a
multiple linear regression model.
 Excel does not appear to offer an automatic way to save Cook's distances in
a multiple linear regression model.
 To create some residual plots with each predictor variable on the horizontal axis automatically in a multiple linear regression model, fit the model using computer
help #31 and check Residual Plots before clicking OK. To create residual plots
manually, first create "standardized residuals" (see computer help #35), and then construct
scatterplots with these "standardized residuals" on the vertical axis. In particular, you should
plot them against the Excelprovided fitted (predicted) values for the regression.
 To create a correlation matrix of quantitative variables (useful for
checking potential multicollinearity problems), select
Tools > Data Analysis > Correlation. Select the Input Range to include the
variables of interest and check Labels in First Row if appropriate. The variables should
be in adjacent columns in the spreadsheet.
 Excel does not appear to offer an automatic way to to find variance
inflation factors in multiple linear regression.
 To draw a predictor effect plot for graphically displaying the effects of
transformed quantitative predictors and/or interactions between quantitative and qualitative
predictors in multiple linear regression, first create a variable representing the effect, say,
"X1effect" (see computer help #6).
 If the "X1effect" variable just involves X1 (e.g., 1 + 3X1 + 4X1^{2}),
sort the X1 variable in ascending order using Data > Sort, and select
Insert > Chart > XY (Scatter)—select the plot with data points connected by smoothed
lines without markers. In Step 2 of the Chart Wizard click on the Series
tab, select the appropriate data ranges for the X Values (sorted X1 variable),
Y Values ("X1effect" variable), and Name boxes, and click Finish.
 Excel does not appear to offer an automatic way to create more complex predictor effect plots with separate lines representing different subsets of the sample (e.g., if the "X1effect" variable also involves a qualitative variable such as, 1 − 2X1 + 3D2X1, where D2 is
an indicator variable).
See Section 5.5 in Pardoe (2012) for an example.
Last updated: June, 2012
© 2012, Iain Pardoe