LSTM Network in R

LSTM network in R, Recurrent Neural Networks will be discussed in this tutorial. Recurrent Neural Networks are extremely useful for resolving problems involving sequences of numbers.

The major applications involved in the sequence of numbers are text classification, time series prediction, frames in videos, DNA sequences Speech recognition problems, etc.

A special type of Recurrent Neural network is LSTM Networks. LSTM networks are very popular and handy.

What is mean by LSTM?

LSTM stands for long short-term memory. LSTM network helps to overcome gradient problems and makes it possible to capture long-term dependencies in the sequence of words or integers.

In this tutorial, we are using the internet movie database (IMDB). This database contains sentiments of movie reviews like 25000 positive reviews and 25000 negative reviews.

Load library

library(keras)
library(tensorflow)
use_condaenv("keras-tf", required = T)

Getting Data

imdb <- dataset_imdb(num_words = 500)

These datasets are already pre-processed so no need to clean the datasets.

How to clean the datasets in R?

c(c(train_x, train_y), c(test_x, test_y)) %<-% imdb
length(train_x); length(test_x)

train_x and test_x contains integer values

train_y and test_y contains labels (0 & 1).

0 represent the negative sentiment and 1 represent positive sentiment in the movie review

table(train_y)
train_y
    0     1
12500 12500
table(test_y)
test_y
    0     1
12500 12500

This indicates that our dataset is balanced.

Words in the movie review are represented by unique integers and each integer is assigned by overall frequency in the dataset. Customer review can extract from the below command.

train_x[[10]]
[1]   1  14  20  47 111 439   2  19  12  15 166  12 216 125  40   6 364 352   2   2  39 294  11  22 396  13  28   8 202  12   2  23  94
[34]   2 151 111 211 469   4  20  13 258   2   2   2  12  16  38  78  33 211  15  12  16   2  63  93  12   6 253 106  10  10  48 335 267
[67]  18   6 364   2   2  20  19   6   2   7   2 189   5   6   2   7   2   2  95   2   6   2   7   2   2  49 369 120   5  28  49 253  10
[100]  10  13   2  19  85   2  15   4 481   9  55  78   2   9 375   8   2   8   2  76   7   4  58   5   4   2   9 243   7  43  50

Before doing any further analysis we need to make ensure the length of the movie reviews is equal, The current dataset has different lengths this can be overcome based on the padding process.

How to execute R in PyCharm?

Padding sequences

train_x <- pad_sequences(train_x, maxlen = 90)
num [1:25000, 1:90] 14 2 360 2 13 0 26 11 6 13 ...
test_x <- pad_sequences(test_x, maxlen = 90)
num [1:25000, 1:90] 0 2 30 8 10 20 2 2 2 50 ...

Now all the train_x and test_x integers are restricted to 90 only. So padding removed all extra integers.

Now you can examine train_x[10,] customer review again

[1]  13 258   2   2   2  12  16  38  78  33 211  15  12  16   2  63  93  12   6 253 106  10  10  48 335 267  18   6 364   2   2  20  19   6
[35]   2   7   2 189   5   6   2   7   2   2  95   2   6   2   7   2   2  49 369 120   5  28  49 253  10  10  13   2  19  85   2  15   4 481
[69]   9  55  78   2   9 375   8   2   8   2  76   7   4  58   5   4   2   9 243   7  43  50

If the dataset contains fewer number integers suppose 60 integers remaining 30 integers that is 0 will be added automatically.

Model

Initiate model with Keras function kera_model_sequantiall and embedded the recurrent neural network layers.

model <- keras_model_sequential()
model %>%
  layer_embedding(input_dim = 500, output_dim = 32) %>%
  layer_simple_rnn(units = 32) %>% 
  layer_dense(units = 1, activation = "sigmoid")

activation we used the sigmoid function that is very useful for interpretation purposes.

Repeated measures of ANOVA in R

Compile Model

model %>% compile(optimizer = "rmsprop",
                  loss = "binary_crossentropy",
                  metrics = c("acc"))

Fit model

history <- model %>% fit(train_x, train_y,
                         epochs = 25,
                         batch_size = 128,
                         validation_split = 0.2)
plot(history)

validation_split indicate 20% of the dataset used for validation purposes.

The top one is for loss and the second one is for accuracy, now you can see validation dataset loss is increasing and accuracy is decreasing from a certain epoch onwards. So this is because of overfitting.

Model Prediction

model %>% evaluate(train_x, train_y) 
loss       acc 
0.3644736 0.8765600 
pred <- model %>%   
predict_classes(train_x) 
table(Predicted=pred, Actual=imdb$train$y)   
Actual Predicted    
 0     1         
0 11503  2089         
1   997 10411 
model %>% evaluate(test_x, test_y) 
loss      acc 
1.032544 0.687720 
pred1 <- model %>%   
predict_classes(test_x) 
table(Predicted=pred1, Actual=imdb$test$y) 
Actual Predicted    
0    1         
0 9203 4510         
1 3297 7990

We got 87 percent accuracy in the training dataset, however, it drops to 68 percent in the test dataset.

As a result, the model needs to be improved in order to make better predictions.

15 Essential packages in R

We can alter the model in some ways.

model %>%
  layer_embedding(input_dim = 500, output_dim = 32) %>%
  layer_simple_rnn(units = 32,return_sequences = TRUE,activation = 'relu') %>% 
  layer_simple_rnn(units = 32,return_sequences = TRUE,activation = 'relu') %>% 
  layer_simple_rnn(units = 32) %>% 
  layer_dense(units = 1, activation = "sigmoid")

Instead of one layer, we employed three layers in the aforementioned model, with TRUE return sequences and the relu activation function. Padding can be tweaked in several ways. Instead of using 90 in the present model, we can find the average customer review and utilize it for padding.

z<-NULL
for(i in 1:250000){z[i]<-print(length(train_x[[i]]))}
summary(z)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   11.0   130.0   178.0   238.7   291.0  2494.0

The median is 178, the mean is 238 and we choose a middle figure of 200 for padding.

Padding sequences

train_x <- pad_sequences(train_x, maxlen = 200)
test_x <- pad_sequences(test_x, maxlen = 200)

Rerun the model and check the accuracy again.

model %>% evaluate(train_x, train_y)
    loss       acc
0.3733827 0.8421200

The train dataset accuracy is 84% earlier it was 87%

model %>% evaluate(test_x, test_y)
    loss       acc
0.4351899 0.8114400

Test data set accuracy significantly improved from 68% to 81%.

Now you can check with a simple LSTM model for better prediction

Naïve Bayes Classification in R

LSTM Network in R

model %>%
  layer_embedding(input_dim = 500, output_dim = 32) %>%
  layer_lstm(units = 32,return_sequences = TRUE) %>% 
  layer_lstm(units = 32,return_sequences = TRUE) %>%
  layer_lstm(units = 32) %>%
  layer_dense(units = 1, activation = "sigmoid")

When you are using the LSTM model try the optimizer “adam” for better prediction.

Compile

model %>% compile(optimizer = "adam",
                  loss = "binary_crossentropy",
                  metrics = c("acc"))

Bidirectional LSTM Model

model %>%
layer_embedding(input_dim = 500, output_dim = 32) %>%
layer_lstm(units = 32,return_sequences = TRUE) %>%
layer_lstm(units = 32,return_sequences = TRUE) %>%
bidirectional(layer_lstm(units = 32)) %>%
layer_dense(units = 1, activation = "sigmoid")

Conclusion

Instead of using a standard LSTM model, a bidirectional model can be used to improve model accuracy.

Deep Neural Network with R

You may also like...

2 Responses

  1. Josh Rodriguez says:

    receiving an error message with the use_condaenv(“keras-tf”, required = T) command. It is saying that it is unable to locate conda environment ‘keras-tf’. Is this required to follow the tutorial? I’ve looked into ways to resolve the error message to no avail

  2. finnstats says:

    Its looks like proper conda installation are not done. For running the script required use_condaenv(“keras-tf”, required = T) code.

Leave a Reply

Your email address will not be published. Required fields are marked *

sixteen − six =