Gradient Boosting in R

Gradient Boosting in R, in this tutorial we are going to discuss extreme gradient boosting.

Why is eXtreme Gradient Boosting in R?

Popular in machine learning challenges.

Fast and accurate

Can handle missing values.

Is it requires numeric inputs?

Yes, eXtreme Gradient Boosting requires a numeric matrix for its input.

Load Library


Getting Data

data <- read.csv("D:/RStudio/NaiveClassifiaction/binary.csv", header = T)
'data.frame': 400 obs. of  4 variables:
 $ admit: int  0 1 1 1 0 1 1 0 1 0 ...
 $ gre  : int  380 660 800 640 520 760 560 400 540 700 ...
 $ gpa  : num  3.61 3.67 4 3.19 2.93 3 2.98 3.08 3.39 3.92 ...
 $ rank : int  3 3 1 4 4 2 1 2 3 2 ...

Data set contains total 400 obsevations and 4 variables. eXtreme Boosting requires numerical variable, so just convert rank data into factor variables.

data$rank <- as.factor(data$rank)

Partition data

Let’s partition the data sets into train data and test data.

ind <- sample(2, nrow(data), replace = T, prob = c(0.8, 0.2))
train <- data[ind==1,]
test <- data[ind==2,]

 Matrix Creation

In this case Rank variable is factor variable required hot encoding for the same. Based on hot encoding factor variables convert into dummy variable.

trainm <- sparse.model.matrix(admit ~ .-1, data = train)
6 x 6 sparse Matrix of class "dgCMatrix"
  gre  gpa rank1 rank2 rank3 rank4
1 380 3.61     .     .     1     .
2 660 3.67     .     .     1     .
3 800 4.00     1     .     .     .
4 640 3.19     .     .     .     1
6 760 3.00     .     1     .     .
7 560 2.98     1     .     .     .
train_label <- train[,"admit"]
train_matrix <- xgb.DMatrix(data = as.matrix(trainm), label = train_label)

Now we converted train data sets in to necessary format., sameway we need to convert into test data also.

testm <- sparse.model.matrix(admit~.-1, data = test)
test_label <- test[,"admit"]
test_matrix <- xgb.DMatrix(data = as.matrix(testm), label = test_label)


nc <- length(unique(train_label))

Basically 2 different classes in the train label datset

xgb_params <- list("objective" = "multi:softprob",
                   "eval_metric" = "mlogloss",
                   "num_class" = nc)
watchlist <- list(train = train_matrix, test = test_matrix)

watch list will help us to identify the error in each iteration.

bst_model <- xgb.train(params = xgb_params,
                       data = train_matrix,
                       nrounds = 1000,
                       watchlist = watchlist,
                       eta = 0.001,
                       max.depth = 3,
                       gamma = 0,
                       subsample = 1,
                       colsample_bytree = 1,
                       missing = NA,
                       seed = 333)
##### xgb.Booster
raw: 2.4 Mb
  xgb.train(params = xgb_params, data = train_matrix, nrounds = 1000,
    watchlist = watchlist, eta = 0.001, max.depth = 3, gamma = 0,
    subsample = 1, colsample_bytree = 1, missing = NA, seed = 333)
params (as set within xgb.train):
  objective = "multi:softprob", eval_metric = "mlogloss", num_class = "2", eta = "0.001", 
ax_depth = "3", gamma = "0", subsample = "1", colsample_bytree = "1", missing = "NA", seed = 
333", validate_parameters = "TRUE"
  cb.print.evaluation(period = print_every_n)
# of features: 6
niter: 1000
nfeatures : 6
    iter train_mlogloss test_mlogloss
       1       0.692889      0.692974
       2       0.692631      0.692793
     999       0.556583      0.625728
    1000       0.556514      0.625710

Training & test error plot

e <- data.frame(bst_model$evaluation_log)
plot(e$iter, e$train_mlogloss, col = 'blue')
lines(e$iter, e$test_mlogloss, col = 'red')

For avoiding overfiiting and best model creation we need to identify best iteration and eta values.

e[e$test_mlogloss == 0. 613294,]

We can rerun the model based on above values.

bst_model <- xgb.train(params = xgb_params,
                       data = train_matrix,
                       nrounds = 2,
                       watchlist = watchlist,
                       eta = 0.613294,
                       max.depth = 3,
                       gamma = 0,
                       subsample = 1,
                       colsample_bytree = 1,
                       missing = NA,
                       seed = 333)
Feature importance
imp <- xgb.importance(colnames(train_matrix), model = bst_model)
Feature       Gain      Cover  Frequency
1:     gpa 0.48632797 0.47722628 0.35714286
2:     gre 0.23495920 0.32849509 0.42857143
3:   rank1 0.23432569 0.17282919 0.14285714
4:   rank2 0.04438714 0.02144944 0.07142857

Prediction & confusion matrix

p <- predict(bst_model, newdata = test_matrix)

Pred <- matrix(p, nrow = nc, ncol = length(p)/nc) %>%
         t() %>%
         data.frame() %>%
         mutate(label = test_label, max_prob = max.col(., "last")-1)
          X1        X2 label max_prob
1  0.7780407 0.2219593     0        0
2  0.6580867 0.3419133     0        0
3  0.6111851 0.3888150     0        0
4  0.4440117 0.5559883     1        1
5  0.6111851 0.3888150     1        0
6  0.4345139 0.5654861     1        1
7  0.7780407 0.2219593     1        0
8  0.7780407 0.2219593     1        0
9  0.6580867 0.3419133     0        0
10 0.6580867 0.3419133     1        0
11 0.6159363 0.3840637     0        0
12 0.6580867 0.3419133     0        0
13 0.7780407 0.2219593     0        0
14 0.5099485 0.4900515     1        0
15 0.4440117 0.5559883     1        1
16 0.7780407 0.2219593     0        0
17 0.5099485 0.4900515     0        0
18 0.6111851 0.3888150     1        0
19 0.7780407 0.2219593     1        0
20 0.7780407 0.2219593     0        0
21 0.7780407 0.2219593     0        0
22 0.6580867 0.3419133     1        0
23 0.7780407 0.2219593     0        0
24 0.8379621 0.1620379     0        0
25 0.1789861 0.8210139     1        1
26 0.6111851 0.3888150     1        0
27 0.7780407 0.2219593     1        0
28 0.6111851 0.3888150     0        0
29 0.6580867 0.3419133     1        0
30 0.1789861 0.8210139     1        1
31 0.5099485 0.4900515     0        0
32 0.7780407 0.2219593     0        0
33 0.6111851 0.3888150     0        0
34 0.6111851 0.3888150     0        0
35 0.7581326 0.2418674     0        0
36 0.6111851 0.3888150     1        0
37 0.6111851 0.3888150     0        0
38 0.6580867 0.3419133     0        0
39 0.6111851 0.3888150     0        0
40 0.6111851 0.3888150     0        0
41 0.6580867 0.3419133     1        0
42 0.6111851 0.3888150     1        0
43 0.6111851 0.3888150     0        0
44 0.7780407 0.2219593     0        0
45 0.6111851 0.3888150     0        0
46 0.7780407 0.2219593     0        0
47 0.6111851 0.3888150     0        0
48 0.6580867 0.3419133     0        0
49 0.6111851 0.3888150     0        0
50 0.6111851 0.3888150     0        0
51 0.7780407 0.2219593     0        0
52 0.6111851 0.3888150     0        0
53 0.6111851 0.3888150     1        0
54 0.6159363 0.3840637     0        0
55 0.6111851 0.3888150     0        0
75 0.5099485 0.4900515     0        0

0 indicates student not admitted and 1 indicates students admitted in the program.

table(Prediction = pred$max_prob, Actual = pred$label)
Prediction  0  1
         0 49 20
         1  1  5


Based on this tutorial you can make use of eXtreme Gradient Boosting machine algorithm applications very easily, in this case model accuracy is around 72%.

