How to find the best regression models in R-Mallows’ Cp
How to find the best regression models in R?. Mallows’ Cp is a statistic used in regression analysis to select the best regression model among numerous options.
The “best” regression model is found by looking for the one with the lowest Cp value that is close to p+1, where p is the number of predictor variables in the model.
The ols_mallows_cp() function from the olsrr package is the simplest way to calculate Mallows’ Cp in R.
The following example explains how to use this function to calculate Mallows’ Cp in R to choose the best regression model from a list of options.
How to find the best regression models in R
Let’s say we want to fit three different multiple linear regression models with data from the mtcars dataset.
The code below demonstrates how to fit the regression models listed below:
Full Model Predictor Variables: All 10 Variables
Predictor variables in Model 1: disp, hp, wt, qsec
Predictor variables in Model 2: disp, qsec
Predictor variables in Model 3: disp, wt
The following code demonstrates how to fit each of these regression models and determine the Mallows’ Cp of each model using the ols_mallows_cp() function.
Let’s fit full model
full <- lm(mpg ~ ., data = mtcars)
Now we can fit three smaller models
model1 <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars) model2 <- lm(mpg ~ disp + qsec, data = mtcars) model3 <- lm(mpg ~ disp + wt, data = mtcars)
Time to calculate Mallows’ Cp for each model
Here’s how to interpret the output:
Model 1: p + 1 = 5, Mallows’ Cp = 4.43
Model 2: p + 1 = 3, Mallows’ Cp = 18.64
Model 3: p + 1 = 30, Mallows’ Cp = 9.12
Model 1 has a Mallows’ Cp value that is closest to p + 1, indicating that it is the best model that leads to the least level of bias among the three possible models.
Mallows’ Cp: Some Thoughts
Here are a few factors to consider when it comes to Mallows’ Cp:
If Mallows’ Cp is high in every feasible model, it means that some crucial predictor variables are likely missing from each model.
Choose the model with the lowest Mallow’s Cp value if several alternative models have low Mallow’s Cp values.
Keep in mind that Mallows’ Cp is just one method for determining the “best” regression model among a number of options.
Adjusted R-squared is another often-used statistic, which shows us how much of the variance in the response variable can be explained by the predictor variables in the model, adjusted for the number of predictor variables utilized.
When determining which regression model is best from a list of multiple options, Mallows’ Cp and modified R-squared should be considered.