The random forest model in R is a highly useful tool in analyzing predicted outcomes for a classification or regression model.
The main idea is how explanatory variables will impact the dependent variable.
In this particular example, we analyze the impact of explanatory variables of Attribute 1, Attribute2, …Attribute6 on the dependent variable Likeability.
Use read.xlsx function to read data into R.
We then split the data set into two training dataset and test data set. Training data that we will use to create our model and then the test data we will test it.
We have randomly created a data frame with a total of 64 data row observations, 60 observations used for training the data set, and 4 observations used for testing purposes.
#Create training and test data
While using tuneRF function we can find out best mtr
mtry = 8 provides best OOB error = 0.01384072
A random forest allows us to determine the most important predictors across the explanatory variables by generating many decision trees and then ranking the variables by importance.
Random Forest Model in R
Type of random forest: regression
Number of trees: 100
No. of variables tried at each split: 8
Mean of squared residuals: 2.00039
% Var explained: 78.58
Basis just 60 data points we 79% variance explained, recommended minimum 100 data points in each model for an accurate result.
Using the Boruta algorithm we can easily find out important attributes in the model.
Boruta performed 87 iterations in 1.140375 secs.
5 attributes confirmed important: Attribute2, Attribute3, Attribute4, Attribute6,
Panel: 2 attributes confirmed unimportant: Attribute1, Attribute5;
Predict test data based on the training model
The predicted values are 4, 4, 5, 5, 4 and the original values are 2, 2, 2, 3, 4, Yes, it’s just close not good.
Recommended to increase the number of data points and increase the model accuracy 79 to at least 85.