Problem on using simple linear regression for calibration

The following problem enables students to see how simple linear regression can be used for calibration (using a small number of accurate, but expensive measurements to calibrate a larger number of inaccurate, but inexpensive measurements).

This situation concerns an actual civil suit, described in Chance (Summer 1994), in which statistics played a key role in the jury's decision. The suit revolved around a five-building apartment complex located in the Bronx, New York. The buildings were constructed in the late 1970s from custom-designed jumbo (35-pound) bricks. Nearly three-quarters of a million bricks were used in the construction. Over time, the bricks began to suffer spalling damage, i.e., separation of some portion of the face of a brick from its body. Experts agreed that the cause of the spalling was winter month freeze-thaw cycles in which water absorbed in the brick face alternates between freezing and thawing. The owner of the complex alleged that the bricks were defective. The brick manufacturer countered that poor design and management of water runoff caused the water to be trapped and absorbed in the bricks, leading to the damage. Ultimately, the suit required an estimate of the spall rate—the rate of damage per 1,000 bricks.

The owner estimated the spall rate using several scaffold-drop surveys. With this method, an engineer lowers a scaffold to selected places on building walls and counts the number of visible spalls for every 1,000 bricks in the observation area. The estimated spall rate is then multiplied by the total number of bricks (in thousands) in the entire complex to determine the total number of damaged bricks. When properly designed, the scaffold-drop survey, although extremely time-consuming and tedious to perform, is considered the "gold standard" for measuring spall damage. However, the owner did not drop the scaffolds at randomly selected wall areas. Instead, scaffolds were dropped primarily in areas of high spall concentration, leading to a substantially biased high estimate of total spall damage.

In an attempt to obtain an unbiased estimate of spall rate, the brick manufacturer conducted its own survey of the walls of the complex. The walls were divided into 83 wall segments and a photograph of each wall segment was taken. The number of spalled bricks that could be made out from each photo was recorded and the sum over all 83 wall segments was used as an estimate of total spall damage.

When the data from the two methods were compared, major discrepancies were discovered. At the eleven locations that had been painstakingly surveyed by the scaffold drops, the spalls visible from the photos did not include all of the spalls identified on the drops. For these wall segments, the photo method provided a serious underestimate of the spall rate, as shown in the following data file (in SPSS, text, and Excel format, respectively): bricks.sav, bricks.txt, bricks.xls. Consequently, the total spall damage estimated by the photo survey will also be underestimated.

In this court case, the jury was faced with the following dilemma: The scaffold-drop survey provided the most accurate estimate of spall rate in a given wall segment. Unfortunately, the drop areas were not selected as random from the entire complex; rather, drops were made at areas with high spall concentrations, leading to an overestimate of the total damage. On the other hand, the photo survey was complete in that all 83 wall segments in the complex were checked for spall damage. But the spall rate estimated by the photos was biased low, leading to an underestimate of the total damage.

Use the data, as did expert statisticians who testified in the case, to help the jury estimate the true spall rate at a given wall segment. Then explain how this information, coupled with the photo data on all 83 wall segments (not given here), can provide a reasonable estimate of the total spall damage (i.e., total number of damaged bricks). [Hint: the key to approaching this problem is figuring out which variable (droprate or photorate) should be the response variable and which the predictor.]


Last updated: April, 2012

© 2012, Iain Pardoe