Example 13. Variable Selection in Multiple Regression: Hald Data

The following data relates to an engineering application that was concerned with the effect of the composition of cement on heat evolved during hardening. The data consists of 4 predictor variables and the response, which are described below:
Y: Heat evolved per gram of cement (in calories)
X1: Amount of tricalcium aluminate
X2: Amount of tricalcium silicate
X3: Amount of tetracalcium alumino ferrite
X4: Amount of dicalcium silicate

Source: Wood, H., Steinour, H.H., and Starke, H.R. (1932). "Effect of Composition of Portland cement on Heat Evolved During Hardening", Industrila and Engineering Chemistry, 24, 1207-1214.

Table 13: Hald Data

X1   X2   X3   X4      Y
 7   26    6   60   78.5
 1   29   15   52   74.3
11   56    8   20  104.3
11   31    8   47   87.6
 7   52    6   33   95.9
11   55    9   22  109.2
 3   71   17    6  102.7
 1   31   22   44   72.5
 2   54   18   22   93.1
21   47    4   26  115.9
 1   40   23   34   83.8
11   66    9   12  113.3
10   68    8   12  109.4

Questions:

  1. What is the best possible regression which involves some or all of the predictor variables?

  2. What are the advantages/disadvantages of the available variable selection procedures?

Keywords: R-square, Mallow's Cp, MSE, forward selection, backward selection, stepwise regression, all possible regressions


Numerical Examples for use with
A First Course in Linear Model Theory by Ravishanker and Dey
Return to: Numerical Examples in Linear Models
Last modified:
Website design: Karen L. Houle