Lightning AI Studios: Never set up a local environment again →

Log in or create a free Lightning.ai account to track your progress and access additional course materials  

8.3 Introduction to Recurrent Neural Networks

References

What we covered in this video lecture

In the previous lecture, we implemented a simple bag-of-words-based classifier as a baseline model. However, one big limitation of the bag-of-words approach is that it cannot encode sequence (or word) order. Consequently, there is sometimes no way to distinguish between similar sentences with different meanings because of varying word order. Researchers proposed 1D convolutional neural networks and recurrent neural networks (RNNs) to address this sequence-order issue. In fact, RNNs have been the most popular deep learning approach for text until transformers came along.

In addition to learning about RNNs, this lecture also introduced the different types of text modeling approaches: many-to-one, one-to-many, and many-to-many (sequence-to-sequence) modeling. Finally, we discussed how we encode words into vectors using one-hot encoding and index-look-ups.

Additional resources if you want to learn more

If you want to learn more about RNNs for generating texts, I highly recommend the article The Unreasonable Effectiveness of Recurrent Neural Networks.

Log in or create a free Lightning.ai account to access:

  • Quizzes
  • Completion badges
  • Progress tracking
  • Additional downloadable content
  • Additional AI education resources
  • Notifications when new units are released
  • Free cloud computing credits

Quiz: 8.3. Introduction to Recurrent Neural Networks (Part 1)

When using a 1-dimensional convolutional neural network for text, we first …

Incorrect. While this is conceptually possible, this is not a common approach.

Correct. There are methods like Word2Vec that do this, or we can use our own embedding layer (more on that in a later video in Unit 8.3).

Please answer all questions to proceed.

Quiz: 8.3. Introduction to Recurrent Neural Networks (Part 2)

Suppose we are working on sentiment analysis: Given a text sequence, the goal is to determine the sentiment of the text (positive, negative, or neutral). This is what type of task?

Incorrect. Since we feed the network with text (multiple inputs), this is not a “one-to-…” task.

Correct. We have many inputs (words) that are associated with a single label.

Incorrect. Since we only obtain one label per text, this is not a “many” output.

Please answer all questions to proceed.

Quiz: 8.3. Introduction to Recurrent Neural Networks (Part 3)

Suppose we have character-level model where each letter is first represented as a one-hot encoding. The letter “a” is represented as the one-hot encoded vector [1, 0, 0, 0, 0, …]. What is the dimensionality of the continuous vector embedding of each letter?

Incorrect. The embedding size (similar to a hidden layer size) is set by the user. There is not enough information above to say what the embedding size is.

  • Incorrect. The embedding size (similar to a hidden layer size) is set by the user. There is not enough information above to say what the embedding size is.

Incorrect. The embedding size (similar to a hidden layer size) is set by the user. There is not enough information above to say what the embedding size is.

Correct. The embedding size (similar to a hidden layer size) is set by the user. There is not enough information above to say what the embedding size is.

Please answer all questions to proceed.

Quiz: 8.3. Introduction to Recurrent Neural Networks (Part 4)

Why might one choose to use an embedding layer instead of a linear layer in an natural language processing (NLP) context?

Correct. Embedding layers are less computationally expensive compared to linear layers when dealing with large-scale NLP tasks.

Incorrect. We use linear layers for continuous data all the time. Also, embedding layers are specifically designed for handling discrete data, like words or tokens in NLP tasks, rather than continuous data.

Incorrect. Linear layers are just performing a matrix multiplication.

Incorrect. Linear layers are indeed compatible with PyTorch (using torch.nn.Linear)

Please answer all questions to proceed.
Watch Video 1

Unit 8.3

Videos