LSTM Network in R
LSTM network in R, Recurrent Neural Networks will be discussed in this tutorial. Recurrent Neural Networks are extremely useful for resolving problems involving sequences of numbers.
The major applications involved in the sequence of numbers are text classification, time series prediction, frames in videos, DNA sequences Speech recognition problems, etc.
A special type of Recurrent Neural network is LSTM Networks. LSTM networks are very popular and handy.
What is mean by LSTM?
LSTM stands for long short-term memory. LSTM network helps to overcome gradient problems and makes it possible to capture long-term dependencies in the sequence of words or integers.
In this tutorial, we are using the internet movie database (IMDB). This database contains sentiments of movie reviews like 25000 positive reviews and 25000 negative reviews.
Load library
library(keras) library(tensorflow) use_condaenv("keras-tf", required = T)
Getting Data
imdb <- dataset_imdb(num_words = 500)
These datasets are already pre-processed so no need to clean the datasets.
How to clean the datasets in R?
c(c(train_x, train_y), c(test_x, test_y)) %<-% imdb length(train_x); length(test_x)
train_x and test_x contains integer values
train_y and test_y contains labels (0 & 1).
0 represent the negative sentiment and 1 represent positive sentiment in the movie review
table(train_y)
train_y 0 1 12500 12500
table(test_y)
test_y 0 1 12500 12500
This indicates that our dataset is balanced.
Words in the movie review are represented by unique integers and each integer is assigned by overall frequency in the dataset. Customer review can extract from the below command.
train_x[[10]]
[1] 1 14 20 47 111 439 2 19 12 15 166 12 216 125 40 6 364 352 2 2 39 294 11 22 396 13 28 8 202 12 2 23 94 [34] 2 151 111 211 469 4 20 13 258 2 2 2 12 16 38 78 33 211 15 12 16 2 63 93 12 6 253 106 10 10 48 335 267 [67] 18 6 364 2 2 20 19 6 2 7 2 189 5 6 2 7 2 2 95 2 6 2 7 2 2 49 369 120 5 28 49 253 10 [100] 10 13 2 19 85 2 15 4 481 9 55 78 2 9 375 8 2 8 2 76 7 4 58 5 4 2 9 243 7 43 50
Before doing any further analysis we need to make ensure the length of the movie reviews is equal, The current dataset has different lengths this can be overcome based on the padding process.
Padding sequences
train_x <- pad_sequences(train_x, maxlen = 90) num [1:25000, 1:90] 14 2 360 2 13 0 26 11 6 13 ... test_x <- pad_sequences(test_x, maxlen = 90) num [1:25000, 1:90] 0 2 30 8 10 20 2 2 2 50 ...
Now all the train_x and test_x integers are restricted to 90 only. So padding removed all extra integers.
Now you can examine train_x[10,] customer review again
[1] 13 258 2 2 2 12 16 38 78 33 211 15 12 16 2 63 93 12 6 253 106 10 10 48 335 267 18 6 364 2 2 20 19 6 [35] 2 7 2 189 5 6 2 7 2 2 95 2 6 2 7 2 2 49 369 120 5 28 49 253 10 10 13 2 19 85 2 15 4 481 [69] 9 55 78 2 9 375 8 2 8 2 76 7 4 58 5 4 2 9 243 7 43 50
If the dataset contains fewer number integers suppose 60 integers remaining 30 integers that is 0 will be added automatically.
Model
Initiate model with Keras function kera_model_sequantiall and embedded the recurrent neural network layers.
model <- keras_model_sequential() model %>% layer_embedding(input_dim = 500, output_dim = 32) %>% layer_simple_rnn(units = 32) %>% layer_dense(units = 1, activation = "sigmoid")
activation we used the sigmoid function that is very useful for interpretation purposes.
Repeated measures of ANOVA in R
Compile Model
model %>% compile(optimizer = "rmsprop", loss = "binary_crossentropy", metrics = c("acc"))
Fit model
history <- model %>% fit(train_x, train_y, epochs = 25, batch_size = 128, validation_split = 0.2) plot(history)
validation_split indicate 20% of the dataset used for validation purposes.
The top one is for loss and the second one is for accuracy, now you can see validation dataset loss is increasing and accuracy is decreasing from a certain epoch onwards. So this is because of overfitting.
Model Prediction
model %>% evaluate(train_x, train_y) loss acc 0.3644736 0.8765600 pred <- model %>% predict_classes(train_x) table(Predicted=pred, Actual=imdb$train$y)
Actual Predicted 0 1 0 11503 2089 1 997 10411 model %>% evaluate(test_x, test_y) loss acc 1.032544 0.687720 pred1 <- model %>% predict_classes(test_x) table(Predicted=pred1, Actual=imdb$test$y)
Actual Predicted 0 1 0 9203 4510 1 3297 7990
We got 87 percent accuracy in the training dataset, however, it drops to 68 percent in the test dataset.
As a result, the model needs to be improved in order to make better predictions.
We can alter the model in some ways.
model %>% layer_embedding(input_dim = 500, output_dim = 32) %>% layer_simple_rnn(units = 32,return_sequences = TRUE,activation = 'relu') %>% layer_simple_rnn(units = 32,return_sequences = TRUE,activation = 'relu') %>% layer_simple_rnn(units = 32) %>% layer_dense(units = 1, activation = "sigmoid")
Instead of one layer, we employed three layers in the aforementioned model, with TRUE return sequences and the relu activation function. Padding can be tweaked in several ways. Instead of using 90 in the present model, we can find the average customer review and utilize it for padding.
z<-NULL for(i in 1:250000){z[i]<-print(length(train_x[[i]]))} summary(z)
Min. 1st Qu. Median Mean 3rd Qu. Max. 11.0 130.0 178.0 238.7 291.0 2494.0
The median is 178, the mean is 238 and we choose a middle figure of 200 for padding.
Padding sequences
train_x <- pad_sequences(train_x, maxlen = 200) test_x <- pad_sequences(test_x, maxlen = 200)
Rerun the model and check the accuracy again.
model %>% evaluate(train_x, train_y) loss acc 0.3733827 0.8421200
The train dataset accuracy is 84% earlier it was 87%
model %>% evaluate(test_x, test_y) loss acc 0.4351899 0.8114400
Test data set accuracy significantly improved from 68% to 81%.
Now you can check with a simple LSTM model for better prediction
Naïve Bayes Classification in R
LSTM Network in R
model %>% layer_embedding(input_dim = 500, output_dim = 32) %>% layer_lstm(units = 32,return_sequences = TRUE) %>% layer_lstm(units = 32,return_sequences = TRUE) %>% layer_lstm(units = 32) %>% layer_dense(units = 1, activation = "sigmoid")
When you are using the LSTM model try the optimizer “adam” for better prediction.
Compile
model %>% compile(optimizer = "adam", loss = "binary_crossentropy", metrics = c("acc"))
Bidirectional LSTM Model
model %>%
layer_embedding(input_dim = 500, output_dim = 32) %>%
layer_lstm(units = 32,return_sequences = TRUE) %>%
layer_lstm(units = 32,return_sequences = TRUE) %>%
bidirectional(layer_lstm(units = 32)) %>%
layer_dense(units = 1, activation = "sigmoid")
Conclusion
Instead of using a standard LSTM model, a bidirectional model can be used to improve model accuracy.
receiving an error message with the use_condaenv(“keras-tf”, required = T) command. It is saying that it is unable to locate conda environment ‘keras-tf’. Is this required to follow the tutorial? I’ve looked into ways to resolve the error message to no avail
Its looks like proper conda installation are not done. For running the script required use_condaenv(“keras-tf”, required = T) code.