Problem on a multiple linear regression application

The following problem provides an interesting application of multiple linear regression.

The vineyards in the Bordeaux region of France are known for producing excellent red wines. However, the uncertainty of the weather during the growing season, the phenomenon that wine tastes better with age, and the fact that some Bordeaux vineyards produce better wines than others encourages speculation concerning the value of a case of wine produced by a certain vineyard during a certain year (or vintage). As a result, many wine experts attempt to to predict the auction price of a case of Bordeaux wine.

The publishers of a newsletter titled Liquid Assets: The International Guide to Fine Wine discussed a multiple regression approach to predicting the London auction price, Y (in dollars), of red Bordeaux wine in Chance (Fall 1995). The natural logarithm of the price, loge(Y), of a case containing a dozen bottles of red wine was modeled as a function of weather during the growing season and age of vintage using data collected for the vintages of 1952-1980 (excluding 1954 and 1956 vintages because they are now rarely sold). Three models were fit to the data. The results of the regressions are summarized in the following table:

  Regression parameter estimates (Standard errors)
Independent variables Model 1 Model 2 Model 3
X1 = Vintage year 0.0354 (0.0137) 0.0238 (0.00717) 0.0240 (0.00747)
X2 = Average growing season temperature (oC) (not included) 0.616 (0.0952) 0.608 (0.116)
X3 = Sep/Aug rainfall (cm) (not included) −0.00386 (0.00081) −0.00380 (0.00095)
X4 = Rainfall in months preceding vintage (cm) (not included) 0.00117 (0.000482) 0.00115 (0.000505)
X5 = Average Sep temperature (oC) (not included) (not included) 0.00765 (0.565)
  R2 = 0.212 R2 = 0.828 R2 = 0.828
  s = 0.575 s = 0.287 s = 0.293
Source: Ashenfelter, O., Ashmore, D., and LaLonde, R. "Bordeaux wine vintage quality and weather." Chance, Vol. 8, No. 4, Fall 1995.
  1. The three models have R2 and s values as shown in the table. Based on this information, which of the three models would you use to predict red Bordeaux wine prices? Explain.
  2. For the model you selected, conduct a hypothesis test for each of the regression parameters in the model. Interpret the results.
  3. When loge(Y) is used as a response variable, the "antilogarithm" of a regression parameter minus 1, that is exp(b)−1 or eb−1, represents the proportional change in Y for every 1-unit increase in the associated X value, holding the other predictors fixed (see explanation below). Use this information to interpret the parameter estimates of the model you selected.

Explanation for result used in question 3

  1. Take model 2 for example, and write it as loge(Y) = b0 + b1 X1 + b2 X2 + b3 X3 + b4 X4.
  2. Exponentiate both sides to get Y = exp(b0 + b1 X1 + b2 X2 + b3 X3 + b4 X4).
  3. If we increase X1 by one unit, this becomes exp(b0 + b1 X1 + b1 + b2 X2 + b3 X3 + b4 X4).
  4. Take the difference between the expressions in (c) and (b) to get exp(b0 + b1 X1 + b2 X2 + b3 X3 + b4 X4) (exp(b1)−1).
  5. Express this difference as a proportional change relative to Y before we increased X1 by one unit: exp(b1)−1
  6. Multiply by 100 to convert this to a percentage change.

Last updated: April, 2012

© 2012, Iain Pardoe