Random Forest Classification Model in R
Why Random Forest classification Model in R? Random forest regression model getting model accuracy is very difficult in some cases, the alternative is to increase the number of data points but in some cases not possible because of the huge cost or unavailability of samples.
Random forest classification will come as a rescue in such situations. How to do random forest classification model in R?. Previous article we discussed the random forest regression model and achieved a 75% variation.
Let us examine same data points from previous posts while using the random forest classification model approach.
In this particular example, we analyze the impact of explanatory variables of Attribute 1, Attribute2, …Attribute6 on the dependent variable Likeability.
Use read.xlsx function to read data into R.
We then split the data set into two training dataset and test data set. Training data that we will use to create our model and then the test data we will test it.
We have randomly created a data frame with a total of 64 data row observations, 60 observations used for training the data set, and 4 observations used for testing purposes.
Before splitting data into training and test data, the response variable (dependent variable) needs to convert into two classifications like LOW and HIGH. How to do that?
Take the average value of the Likeability column, if the observation from the Likeability variable is less than the average value coded as LOW otherwise HIGH.
#Create training and test data
Type of random forest: classification
Number of trees: 100
No. of variables tried at each split: 6
OOB estimate of error rate: 6.67%
HIGH LOW class.error
HIGH 10 2 0.16666667
LOW 2 46 0.04166667
Model accuracy is 93%, compared to the random forest regression model classification model accuracy is much higher.
Boruta performed 29 iterations in 1.248526 secs.
7 attributes confirmed important: Attribute1, Attribute2, Attribute3, Attribute4, Attribute5 and 2 more;
No attributes deemed unimportant.
Predict test data based on the training model
The predicted values are HIGH, HIGH, HIGH, HIGH, HIGH and the original values are HIGH, HIGH, HIGH, HIGH, HIGH Yes, it’s better prediction 93% accuracy than the random forest regression model.