Model Selection in R (AIC Vs BIC)

Model Selection in R, Let’s look at a linear regression model using mtcars dataset.

First, we need to brush up on our knowledge by looking at the mtcars dataset.

head(mtcars)
M                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

mtcars collects data on a variety of car attributes, but let’s pretend we’re trying to develop a model to better understand the link between mpg and the other factors in the mtcars dataset.

Here we are going to start with a basic model in where we theorize that a car’s engine displacement (in cubic inches), disp, contributes to its mpg.

One Sample Analysis in R » Quick Guide »

Model Selection in R

mtcars.lm <- lm(mpg ~ disp, data=mtcars)
summary(mtcars.lm)
Call:
lm(formula = mpg ~ disp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.8922 -2.2022 -0.9631  1.6272  7.2305 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 29.599855   1.229720  24.070  < 2e-16 ***
disp        -0.041215   0.004712  -8.747 9.38e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.251 on 30 degrees of freedom
Multiple R-squared:  0.7183,	Adjusted R-squared:  0.709 
F-statistic: 76.51 on 1 and 30 DF,  p-value: 9.38e-10

We can see certain metrics of model performance in our model summary, but if we want to know our model’s AIC and BIC, we can make use of the glance() function from the broom package.

Naive Bayes Classification in R » Prediction Model »

broom::glance(mtcars.lm)
r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC deviance df.residual  nobs
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1     0.718         0.709  3.25      76.5 9.38e-10     1  -82.1  170.  175.     317.          30    32

As a result, our basic model contains AIC is170 and BIC is 175.

What if we add a third predictor to the model?

Let’s try a model in which the weight (in thousands of pounds) of an automobile, wt, is also used to explain its mpg.

mtcars.lm <- lm(mpg ~ disp + wt, data=mtcars)
broom::glance(mtcars.lm)
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC deviance df.residual  nobs
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1     0.781         0.766  2.92      51.7 2.74e-10     2  -78.1  164.  170.     247.          29    32

Our new model includes AIC is164 and BIC is 170. Yes, It’s worth noting that our new model’s AIC and BIC are both lower than our previous model.

This suggests that the benefits of enhanced explanatory power outweigh the cost of increasing model complexity, according to both information criteria.

Cluster Meaning-Cluster or area sampling in a nutshell »

Let’s try to some more parameters into the model and will check how the AIC and BIC of our model change as a result.

mtcars.lm <- lm(mpg ~ disp + wt + hp, data=mtcars)
broom::glance(mtcars.lm)
 r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC deviance df.residual  nobs
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1     0.827         0.808  2.64      44.6 8.65e-11     3  -74.3  159.  166.     195.          28    32

Yes, AIC is 159 and BIC is 166

mtcars.lm <- lm(mpg ~ disp + wt + hp + cyl, data=mtcars) 
broom::glance(mtcars.lm)
r.s  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC deviance df.residual  nobs
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1     0.849         0.826  2.51      37.8 1.06e-10     4  -72.2  156.  165.     170.          27    32

Wow, AIC become now 156 and BIC is 165. Cool, let’s try another model.

Principal component analysis (PCA) in R »

mtcars.lm <- lm(mpg ~ disp + wt + hp + cyl + gear, data=mtcars)
broom::glance(mtcars.lm)
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC deviance df.residual  nobs
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1     0.851         0.822  2.54      29.7 5.72e-10     5  -71.9  158.  168.     168.          26    32

Oh no, AIC is 158 and BIC is 168, both increased which indicates we’ve done too much!.

The final model’s AIC and BIC have increased while adding gears into the model. Given that our model already included disp, wt, hp, and cyl, the boost in explanatory power gained by introducing gear was not worth the increase in model complexity.

Conclusion

So we started with a simple linear regression model and gradually increased the number of parameters until the AIC and BIC stopped falling.

It’s cool, but it isn’t the end of the narrative. We haven’t given any thought to how we would determine which variables to include in our model.

Yes, we will discuss the same, fascinating story in the upcoming post!

KNN Algorithm Machine Learning » Classification & Regression »

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

eight − 1 =