Random Forest Classification Model in R

Why Random Forest classification Model in R? Random forest regression model getting model accuracy is very difficult in some cases, the alternative is to increase the number of data points but in some cases not possible because of the huge cost or unavailability of samples.

Random forest classification will come as a rescue in such situations. How to do random forest classification model in R?. Previous article we discussed the random forest regression model and achieved a 75% variation.

Let us examine same data points from previous posts while using the random forest classification model approach.

In this particular example, we analyze the impact of explanatory variables of Attribute 1, Attribute2, …Attribute6 on the dependent variable Likeability.

Data Loading

Use read.xlsx function to read data into R.

data<-read.xlsx("D:/rawdata.xlsx",sheetName="Sheet1") 

We then split the data set into two training dataset and test data set. Training data that we will use to create our model and then the test data we will test it.

We have randomly created a data frame with a total of 64 data row observations, 60 observations used for training the data set, and 4 observations used for testing purposes.

Before splitting data into training and test data, the response variable (dependent variable) needs to convert into two classifications like LOW and HIGH. How to do that?

Take the average value of the Likeability column, if the observation from the Likeability variable is less than the average value coded as LOW otherwise HIGH. 

#Create training and test data

inputData <- data[1:60, ] # training data
testData <- data[20:64, ] # test data
randomForest(formula = Likeabilty ~ ., data = data2, importance = TRUE,
proximity = TRUE, ntree = 100, mtry = 6, plot = FALSE) 

Type of random forest: classification

Number of trees: 100

No. of variables tried at each split: 6

OOB estimate of  error rate: 6.67%

Confusion matrix:

HIGH LOW class.error

HIGH   10   2  0.16666667

LOW     2  46  0.04166667

Model accuracy is 93%, compared to the random forest regression model classification model accuracy is much higher.

Important Attributes

Important <- Boruta(Likeabilty~ ., data = data2)
print(Important)

 Boruta performed 29 iterations in 1.248526 secs.

7 attributes confirmed important: Attribute1, Attribute2, Attribute3, Attribute4, Attribute5 and 2 more;

No attributes deemed unimportant.

 Prediction

Predict test data based on the training model

testData1<-testData[,-dim(testData)[2]]
prediction <- predict(AttribImp.rf,testData1)
print(prediction)

Conclusion

The predicted values are HIGH, HIGH, HIGH, HIGH, HIGH and the original values are HIGH, HIGH, HIGH, HIGH, HIGH Yes, it’s better prediction 93% accuracy than the random forest regression model.

What is NULL Hypothesis

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

3 + 4 =