Problem on multiple linear regression model building

The following problem provides another challenging dataset that students can use to try to find their best multiple linear regression model.

You've been asked to develop a regression model for predicting company stock price. You have data on 100 stocks, and would like to build a regression model for predicting logY = natural logarithm of current stock price from 7 potential predictor variables:

The data are available in the following data files (in SPSS, text, and Excel format, respectively): stocks.sav, stocks.txt, stocks.xls. Note that market is a qualitative (categorical) variable with three levels. Do not use this variable as a predictor; instead, you will need to use two dummy indicator variables based on this variable to model differing "market effects." For example, your two indicator variables could be D7 (= 1 for market 2, = 0 otherwise) and D8 (= 1 for market 3, = 0 otherwise), so that market 1 is the reference level. See Computer Help #3 for how to create these indicator variables. This problem is focused on model-building, not interpretation, but if you wanted to interpet models that include these indicator variables, you would plug-in D7 = 0 and D8 = 0 for market 1, or D7 = 1 and D8 = 0 for market 2, or D7 = 0 and D8 = 1 for market 3 (see pages 153-158 in Chapter 4 of the book for another example).

Build a suitable regression model. You may want to consider the following topics in doing so:

[Hint: try to ensure the largest sample size possible through careful selection of which quantitative predictors to use; a "good" model should have R2 around 0.85, a regression standard error, s, around 0.34, and a sample size, n, of 87.]


Last updated: April, 2012

© 2012, Iain Pardoe