Problem on simple linear regression model evaluation

Section 2.3 in the book suggests three ways to numerically evaluate a simple linear regression model: the regression standard error, s, the coefficient of determination, R2, and assessing the slope parameter, b1, using a hypothesis test or confidence interval. For simple linear regression all three methods are mathematically equivalent, but their generalizations for multiple linear regression are not equivalent.

The purpose of this exercise is to demonstrate that a simple linear regression model with superior measures of fit (i.e., a lower s, a higher R2, and a higher absolute t-statistic for the slope) does not necessarily imply the model is more appropriate than a model with a higher s, a lower R2, and a lower absolute t-statistic for the slope. These three measures of model fit should always be used in conjunction with a graphical check of the model to make sure that it is appropriate (e.g., see section 2.4 in the book).

Download the simulated data from one of the following files (in SPSS, text, and Excel format, respectively): COMPARE.SAV, COMPARE.TXT, COMPARE.XLS. There is a single response variable, Y, and four possible predictor variables, X1, X2, X3, and X4.

  1. Fit a simple linear regression using X1 as the predictor. Make a note of the values of the regression standard error, s, the coefficient of determination, R2, and the t-statistic for the slope, t. Also construct a scatterplot of Y (vertical) versus X1 (horizontal) and add the least squares regression line to the plot.
  2. Repeat part (a), but this time use X2 as the predictor. Do the values of s, R2, and t suggest a worse or better fit than the model from part (a)? Does the visual appearance of the scatterplot with the X2 model confirm or contradict the numerical findings?
  3. Repeat part (a), but this time use X3 as the predictor. Do the values of s, R2, and t suggest a worse or better fit than the model from part (a)? Does the visual appearance of the scatterplot with the X3 model confirm or contradict the numerical findings?
  4. Repeat part (a), but this time use X4 as the predictor. Do the values of s, R2, and t suggest a worse or better fit than the model from part (a)? Does the visual appearance of the scatterplot with the X4 model confirm or contradict the numerical findings?

You should find that the model with X1 has a higher s, a lower R2, and a lower absolute t-statistic for the slope than the other three models (which all have the same values of s, R2, and t). However, the model with X1 is more appropriate than each of the other three models:

Concluding message: measures of regression model fit like the regression standard error, s, the coefficient of determination, R2, and the absolute t-statistic for the slope are only really meaningful when the assumptions of the model are broadly satisfied.


Last updated: September, 2006

The views and opinions expressed in this page are strictly those of the page author. The contents of this page have not been reviewed or approved by the University of Oregon.

© 2006, Iain Pardoe, Lundquist College of Business, University of Oregon