Example 7. Influence Diagnostics: Solid Waste Data

The following data pertains to a study of production waste and land use. There are n=40 observations on a response variable Y and 5 regressor variables that are described below.
Y: Solid waste (in millions of tons)
X1: Industrial land (acres)
X2: Fabricated metals (acres)
X3: Trucking and wholesale trade (acres)
X4: Retail trade (acres)
X5: Restaurants and hotels

Source: Golueke, C.G. and McGauhey, P.H. (1970), Comprehensive Studies of Solid Waste Management, U.S. Department of Health, Education, and Welfare, Public Health Services Publication No. 2039.

Table 7: Solid Waste Data

 Obs     X1     X2     X3    X4    X5       Y
   1    102     69    133   125    36  0.3574
   2   1120    723   2616   953   132  1.9673
   3    139    138     46    35     6  0.1862
   4    221    637    153   115    16  0.3816
   5     12      0      1     9     1  0.1512
   6      1     50      3    25     2  0.1449
   7   1046    127    313   392    56  0.4711
   8   2032     44    409   540    98  0.6512
   9    895     54    168   117    32  0.6624
  10      0      0      2     0     1  0.3457
  11     25      2     24    78    15  0.3355
  12     97     12     91   135    24  0.3982
  13      1      0     15    46    11  0.2044
  14      4      1     18    23     8  0.2969
  15     42      4     78    41    61  1.1515
  16     87    162    599    11     3  0.5609
  17      2      0     26    24     6  0.1104
  18      2      9     29    11     2  0.0863
  19     48     18    101    25     4  0.1952
  20    131    126    387     6     0  0.1688
  21      4      0    103    49     9  0.0786
  22      1      4     46    16     2  0.0955
  23      0      0    468    56     2  0.0486
  24      7      0     52    37     5  0.0867
  25      5      1      6    95    11  0.1403
  26    174    113    285    69    18  0.3786
  27      0      0      6    35     4  0.0761
  28    233    153    682   404    85  0.8927
  29    155     56     94    75    17  0.3621
  30    120     74     55   120     8  0.1758
  31   8983     37    236    77    38  0.2699
  32     59     54    138    55    11  0.2762
  33     72    112    169   228    39  0.3240
  34    571     78     25   162    43  0.3737
  35    853   1002   1017   418    57  0.9114
  36      5      0     17    14    13  0.2594
  37     11     34      3    20     4  0.4284
  38    258      1     33    48    13  0.1905
  39     69     14    126   108    20  0.2341
  40   4790   2046   3719    31     7  0.7759

Questions:

  1. Identify and discuss possible:
    1. outliers
    2. high-leverage points
    3. influential observations

Keywords: Studentized residuals, leverage, Cook's D, DFFITS, DFBETAS, VARRATIO


Numerical Examples for use with
A First Course in Linear Model Theory by Ravishanker and Dey
Return to: Numerical Examples in Linear Models
Last modified:
Website design: Karen L. Houle