How to Perform a Lack of Fit Test in R-Quick Guide
Lack of Fit Test in R, A lack of fit test is used to determine whether a full regression model fits a dataset significantly better than a reduced version of the model.
Consider the following regression model, which has four predictor variables.
Y = β0 + β1×1 + β2×2 + β3×3 + β4×4 + ε
A nested model is demonstrated by the following model, which contains only two of the original predictor variables.
Y = β0 + β1×1 + β2×2 + ε
We can use a Lack of Fit Test with the following null and alternative hypotheses to see if these two models differ significantly.
Hypothesis
H0: The full model and the nested model both fit the data equally well. As a result, the nested model should be used.
H1: In terms of data fit, the full model significantly outperforms the nested model. As a result, you must employ the entire model.
Step 1: Create a Dataset
We can make use of mtcars data set. Let’s load the data set first.
data(mtcars) head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Step 2: Fit Two Different Models to the Dataset
The dataset will then be fitted with two different regression models.
Now we can fit the full model
fullmodel <- lm(mpg ~ cyl + disp + hp + wt, data = mtcars)
Let’s fit a reduced model
reducedmodel <- lm(mpg ~ cyl + disp, data = mtcars)
Step 3: Perform a Lack of Fit Test
The anova() command will then be used to perform a lack of fit test between the two models.
Lack of Fit Test in R
anova(fullmodel, reducedmodel)
Analysis of Variance Table Model 1: mpg ~ cyl + disp + hp + wt Model 2: mpg ~ cyl + disp Res.Df RSS Df Sum of Sq F Pr(>F) 1 27 170.44 2 29 270.74 -2 -100.3 7.9439 0.001936 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The F test statistic is 7.9439, with a corresponding p-value of 0.001936.
We can reject the null hypothesis of the test because this p-value is less than 0.05 and conclude that the full model provides a statistically significantly better fit than the reduced model.
Likelihood Ratio Test in R with Example »
Subscribe to our newsletter!