Model Selection in R (AIC Vs BIC)
Model Selection in R, Let’s look at a linear regression model using mtcars dataset.
First, we need to brush up on our knowledge by looking at the mtcars dataset.
head(mtcars)
M mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mtcars collects data on a variety of car attributes, but let’s pretend we’re trying to develop a model to better understand the link between mpg and the other factors in the mtcars dataset.
Here we are going to start with a basic model in where we theorize that a car’s engine displacement (in cubic inches), disp, contributes to its mpg.
One Sample Analysis in R » Quick Guide »
Model Selection in R
mtcars.lm <- lm(mpg ~ disp, data=mtcars) summary(mtcars.lm)
Call: lm(formula = mpg ~ disp, data = mtcars) Residuals: Min 1Q Median 3Q Max -4.8922 -2.2022 -0.9631 1.6272 7.2305 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 29.599855 1.229720 24.070 < 2e-16 *** disp -0.041215 0.004712 -8.747 9.38e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.251 on 30 degrees of freedom Multiple R-squared: 0.7183, Adjusted R-squared: 0.709 F-statistic: 76.51 on 1 and 30 DF, p-value: 9.38e-10
We can see certain metrics of model performance in our model summary, but if we want to know our model’s AIC and BIC, we can make use of the glance() function from the broom package.
Naive Bayes Classification in R » Prediction Model »
broom::glance(mtcars.lm)
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> 1 0.718 0.709 3.25 76.5 9.38e-10 1 -82.1 170. 175. 317. 30 32
As a result, our basic model contains AIC is170 and BIC is 175.
What if we add a third predictor to the model?
Let’s try a model in which the weight (in thousands of pounds) of an automobile, wt, is also used to explain its mpg.
mtcars.lm <- lm(mpg ~ disp + wt, data=mtcars) broom::glance(mtcars.lm)
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> 1 0.781 0.766 2.92 51.7 2.74e-10 2 -78.1 164. 170. 247. 29 32
Our new model includes AIC is164 and BIC is 170. Yes, It’s worth noting that our new model’s AIC and BIC are both lower than our previous model.
This suggests that the benefits of enhanced explanatory power outweigh the cost of increasing model complexity, according to both information criteria.
Cluster Meaning-Cluster or area sampling in a nutshell »
Let’s try to some more parameters into the model and will check how the AIC and BIC of our model change as a result.
mtcars.lm <- lm(mpg ~ disp + wt + hp, data=mtcars) broom::glance(mtcars.lm)
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> 1 0.827 0.808 2.64 44.6 8.65e-11 3 -74.3 159. 166. 195. 28 32
Yes, AIC is 159 and BIC is 166
mtcars.lm <- lm(mpg ~ disp + wt + hp + cyl, data=mtcars) broom::glance(mtcars.lm)
r.s r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> 1 0.849 0.826 2.51 37.8 1.06e-10 4 -72.2 156. 165. 170. 27 32
Wow, AIC become now 156 and BIC is 165. Cool, let’s try another model.
Principal component analysis (PCA) in R »
mtcars.lm <- lm(mpg ~ disp + wt + hp + cyl + gear, data=mtcars) broom::glance(mtcars.lm)
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> 1 0.851 0.822 2.54 29.7 5.72e-10 5 -71.9 158. 168. 168. 26 32
Oh no, AIC is 158 and BIC is 168, both increased which indicates we’ve done too much!.
The final model’s AIC and BIC have increased while adding gears into the model. Given that our model already included disp, wt, hp, and cyl, the boost in explanatory power gained by introducing gear was not worth the increase in model complexity.
Conclusion
So we started with a simple linear regression model and gradually increased the number of parameters until the AIC and BIC stopped falling.
It’s cool, but it isn’t the end of the narrative. We haven’t given any thought to how we would determine which variables to include in our model.
Yes, we will discuss the same, fascinating story in the upcoming post!
KNN Algorithm Machine Learning » Classification & Regression »