How to Perform a Lack of Fit Test in R-Quick Guide

by finnstats

Lack of Fit Test in R, A lack of fit test is used to determine whether a full regression model fits a dataset significantly better than a reduced version of the model.

Consider the following regression model, which has four predictor variables.

Y = β0 + β1×1 + β2×2 + β3×3 + β4×4 + ε

A nested model is demonstrated by the following model, which contains only two of the original predictor variables.

Y = β0 + β1×1 + β2×2 + ε

We can use a Lack of Fit Test with the following null and alternative hypotheses to see if these two models differ significantly.

Hypothesis

H0: The full model and the nested model both fit the data equally well. As a result, the nested model should be used.

H1: In terms of data fit, the full model significantly outperforms the nested model. As a result, you must employ the entire model.

Step 1: Create a Dataset

We can make use of mtcars data set. Let’s load the data set first.

data(mtcars)
head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Step 2: Fit Two Different Models to the Dataset

The dataset will then be fitted with two different regression models.

Now we can fit the full model

fullmodel <- lm(mpg ~ cyl + disp + hp + wt, data = mtcars)

Let’s fit a reduced model

reducedmodel <- lm(mpg ~ cyl + disp, data = mtcars)

Step 3: Perform a Lack of Fit Test

The anova() command will then be used to perform a lack of fit test between the two models.

Lack of Fit Test in R

anova(fullmodel, reducedmodel)

Analysis of Variance Table
Model 1: mpg ~ cyl + disp + hp + wt
Model 2: mpg ~ cyl + disp
  Res.Df    RSS Df Sum of Sq      F   Pr(>F)  
1     27 170.44                               
2     29 270.74 -2    -100.3 7.9439 0.001936 **

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The F test statistic is 7.9439, with a corresponding p-value of 0.001936.

We can reject the null hypothesis of the test because this p-value is less than 0.05 and conclude that the full model provides a statistically significantly better fit than the reduced model.

Likelihood Ratio Test in R with Example »

Subscribe to our newsletter!

[newsletter_form type=”minimal”]