.. _sec_sentiment_rnn:
Sentiment Analysis: Using Recurrent Neural Networks
===================================================
Like word similarity and analogy tasks, we can also apply pretrained
word vectors to sentiment analysis. Since the IMDb review dataset in
:numref:`sec_sentiment` is not very big, using text representations
that were pretrained on large-scale corpora may reduce overfitting of
the model. As a specific example illustrated in
:numref:`fig_nlp-map-sa-rnn`, we will represent each token using the
pretrained GloVe model, and feed these token representations into a
multilayer bidirectional RNN to obtain the text sequence representation,
which will be transformed into sentiment analysis outputs
:cite:`Maas.Daly.Pham.ea.2011`. For the same downstream application,
we will consider a different architectural choice later.
.. _fig_nlp-map-sa-rnn:
.. figure:: ../img/nlp-map-sa-rnn.svg
This section feeds pretrained GloVe to an RNN-based architecture for
sentiment analysis.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
import torch
from torch import nn
from d2l import torch as d2l
batch_size = 64
train_iter, test_iter, vocab = d2l.load_data_imdb(batch_size)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
from mxnet import gluon, init, np, npx
from mxnet.gluon import nn, rnn
from d2l import mxnet as d2l
npx.set_np()
batch_size = 64
train_iter, test_iter, vocab = d2l.load_data_imdb(batch_size)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
[22:13:34] ../src/storage/storage.cc:196: Using Pooled (Naive) StorageManager for CPU
.. raw:: html
.. raw:: html
Representing Single Text with RNNs
----------------------------------
In text classifications tasks, such as sentiment analysis, a
varying-length text sequence will be transformed into fixed-length
categories. In the following ``BiRNN`` class, while each token of a text
sequence gets its individual pretrained GloVe representation via the
embedding layer (``self.embedding``), the entire sequence is encoded by
a bidirectional RNN (``self.encoder``). More concretely, the hidden
states (at the last layer) of the bidirectional LSTM at both the initial
and final time steps are concatenated as the representation of the text
sequence. This single text representation is then transformed into
output categories by a fully connected layer (``self.decoder``) with two
outputs (“positive” and “negative”).
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
class BiRNN(nn.Module):
def __init__(self, vocab_size, embed_size, num_hiddens,
num_layers, **kwargs):
super(BiRNN, self).__init__(**kwargs)
self.embedding = nn.Embedding(vocab_size, embed_size)
# Set `bidirectional` to True to get a bidirectional RNN
self.encoder = nn.LSTM(embed_size, num_hiddens, num_layers=num_layers,
bidirectional=True)
self.decoder = nn.Linear(4 * num_hiddens, 2)
def forward(self, inputs):
# The shape of `inputs` is (batch size, no. of time steps). Because
# LSTM requires its input's first dimension to be the temporal
# dimension, the input is transposed before obtaining token
# representations. The output shape is (no. of time steps, batch size,
# word vector dimension)
embeddings = self.embedding(inputs.T)
self.encoder.flatten_parameters()
# Returns hidden states of the last hidden layer at different time
# steps. The shape of `outputs` is (no. of time steps, batch size,
# 2 * no. of hidden units)
outputs, _ = self.encoder(embeddings)
# Concatenate the hidden states at the initial and final time steps as
# the input of the fully connected layer. Its shape is (batch size,
# 4 * no. of hidden units)
encoding = torch.cat((outputs[0], outputs[-1]), dim=1)
outs = self.decoder(encoding)
return outs
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
class BiRNN(nn.Block):
def __init__(self, vocab_size, embed_size, num_hiddens,
num_layers, **kwargs):
super(BiRNN, self).__init__(**kwargs)
self.embedding = nn.Embedding(vocab_size, embed_size)
# Set `bidirectional` to True to get a bidirectional RNN
self.encoder = rnn.LSTM(num_hiddens, num_layers=num_layers,
bidirectional=True, input_size=embed_size)
self.decoder = nn.Dense(2)
def forward(self, inputs):
# The shape of `inputs` is (batch size, no. of time steps). Because
# LSTM requires its input's first dimension to be the temporal
# dimension, the input is transposed before obtaining token
# representations. The output shape is (no. of time steps, batch size,
# word vector dimension)
embeddings = self.embedding(inputs.T)
# Returns hidden states of the last hidden layer at different time
# steps. The shape of `outputs` is (no. of time steps, batch size,
# 2 * no. of hidden units)
outputs = self.encoder(embeddings)
# Concatenate the hidden states at the initial and final time steps as
# the input of the fully connected layer. Its shape is (batch size,
# 4 * no. of hidden units)
encoding = np.concatenate((outputs[0], outputs[-1]), axis=1)
outs = self.decoder(encoding)
return outs
.. raw:: html
.. raw:: html
Let’s construct a bidirectional RNN with two hidden layers to represent
single text for sentiment analysis.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
embed_size, num_hiddens, num_layers, devices = 100, 100, 2, d2l.try_all_gpus()
net = BiRNN(len(vocab), embed_size, num_hiddens, num_layers)
def init_weights(module):
if type(module) == nn.Linear:
nn.init.xavier_uniform_(module.weight)
if type(module) == nn.LSTM:
for param in module._flat_weights_names:
if "weight" in param:
nn.init.xavier_uniform_(module._parameters[param])
net.apply(init_weights);
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
embed_size, num_hiddens, num_layers, devices = 100, 100, 2, d2l.try_all_gpus()
net = BiRNN(len(vocab), embed_size, num_hiddens, num_layers)
net.initialize(init.Xavier(), ctx=devices)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
[22:13:40] ../src/storage/storage.cc:196: Using Pooled (Naive) StorageManager for GPU
[22:13:40] ../src/storage/storage.cc:196: Using Pooled (Naive) StorageManager for GPU
.. raw:: html
.. raw:: html
Loading Pretrained Word Vectors
-------------------------------
Below we load the pretrained 100-dimensional (needs to be consistent
with ``embed_size``) GloVe embeddings for tokens in the vocabulary.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
glove_embedding = d2l.TokenEmbedding('glove.6b.100d')
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
glove_embedding = d2l.TokenEmbedding('glove.6b.100d')
.. raw:: html
.. raw:: html
Print the shape of the vectors for all the tokens in the vocabulary.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
embeds = glove_embedding[vocab.idx_to_token]
embeds.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
torch.Size([49346, 100])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
embeds = glove_embedding[vocab.idx_to_token]
embeds.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(49346, 100)
.. raw:: html
.. raw:: html
We use these pretrained word vectors to represent tokens in the reviews
and will not update these vectors during training.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net.embedding.weight.data.copy_(embeds)
net.embedding.weight.requires_grad = False
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net.embedding.weight.set_data(embeds)
net.embedding.collect_params().setattr('grad_req', 'null')
.. raw:: html
.. raw:: html
Training and Evaluating the Model
---------------------------------
Now we can train the bidirectional RNN for sentiment analysis.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
lr, num_epochs = 0.01, 5
trainer = torch.optim.Adam(net.parameters(), lr=lr)
loss = nn.CrossEntropyLoss(reduction="none")
d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
loss 0.277, train acc 0.884, test acc 0.861
2608.4 examples/sec on [device(type='cuda', index=0), device(type='cuda', index=1)]
.. figure:: output_sentiment-analysis-rnn_6199ad_57_1.svg
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
lr, num_epochs = 0.01, 5
trainer = gluon.Trainer(net.collect_params(), 'adam', {'learning_rate': lr})
loss = gluon.loss.SoftmaxCrossEntropyLoss()
d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
loss 0.305, train acc 0.867, test acc 0.852
822.5 examples/sec on [gpu(0), gpu(1)]
.. figure:: output_sentiment-analysis-rnn_6199ad_60_1.svg
.. raw:: html
.. raw:: html
We define the following function to predict the sentiment of a text
sequence using the trained model ``net``.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
#@save
def predict_sentiment(net, vocab, sequence):
"""Predict the sentiment of a text sequence."""
sequence = torch.tensor(vocab[sequence.split()], device=d2l.try_gpu())
label = torch.argmax(net(sequence.reshape(1, -1)), dim=1)
return 'positive' if label == 1 else 'negative'
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
#@save
def predict_sentiment(net, vocab, sequence):
"""Predict the sentiment of a text sequence."""
sequence = np.array(vocab[sequence.split()], ctx=d2l.try_gpu())
label = np.argmax(net(sequence.reshape(1, -1)), axis=1)
return 'positive' if label == 1 else 'negative'
.. raw:: html
.. raw:: html
Finally, let’s use the trained model to predict the sentiment for two
simple sentences.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
predict_sentiment(net, vocab, 'this movie is so great')
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
'positive'
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
predict_sentiment(net, vocab, 'this movie is so bad')
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
'negative'
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
predict_sentiment(net, vocab, 'this movie is so great')
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
'positive'
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
predict_sentiment(net, vocab, 'this movie is so bad')
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
'negative'
.. raw:: html
.. raw:: html
Summary
-------
- Pretrained word vectors can represent individual tokens in a text
sequence.
- Bidirectional RNNs can represent a text sequence, such as via the
concatenation of its hidden states at the initial and final time
steps. This single text representation can be transformed into
categories using a fully connected layer.
Exercises
---------
1. Increase the number of epochs. Can you improve the training and
testing accuracies? How about tuning other hyperparameters?
2. Use larger pretrained word vectors, such as 300-dimensional GloVe
embeddings. Does it improve classification accuracy?
3. Can we improve the classification accuracy by using the spaCy
tokenization? You need to install spaCy (``pip install spacy``) and
install the English package (``python -m spacy download en``). In the
code, first, import spaCy (``import spacy``). Then, load the spaCy
English package (``spacy_en = spacy.load('en')``). Finally, define
the function
``def tokenizer(text): return [tok.text for tok in spacy_en.tokenizer(text)]``
and replace the original ``tokenizer`` function. Note the different
forms of phrase tokens in GloVe and spaCy. For example, the phrase
token “new york” takes the form of “new-york” in GloVe and the form
of “new york” after the spaCy tokenization.
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html