Deep Learning Fundamentals

Pages

Deep Learning Fundamentals

Deep Learning Fundamentals > Unit 8 > Unit 8.3

Course Progress:

8.3 Introduction to Recurrent Neural Networks

Slides

Part 1: Modeling Sequence Data
Part 2: The Different Sequence Modeling Tasks
Part 3: Encoding Inputs Using Embedding Layers
Part 4: Embedding Layers in PyTorch

References

torch.nn.Embedding documentation

What we covered in this video lecture

In the previous lecture, we implemented a simple bag-of-words-based classifier as a baseline model. However, one big limitation of the bag-of-words approach is that it cannot encode sequence (or word) order. Consequently, there is sometimes no way to distinguish between similar sentences with different meanings because of varying word order. Researchers proposed 1D convolutional neural networks and recurrent neural networks (RNNs) to address this sequence-order issue. In fact, RNNs have been the most popular deep learning approach for text until transformers came along.

In addition to learning about RNNs, this lecture also introduced the different types of text modeling approaches: many-to-one, one-to-many, and many-to-many (sequence-to-sequence) modeling. Finally, we discussed how we encode words into vectors using one-hot encoding and index-look-ups.

Additional resources if you want to learn more

If you want to learn more about RNNs for generating texts, I highly recommend the article The Unreasonable Effectiveness of Recurrent Neural Networks.

Log in or create a free Lightning.ai account to access:

Quizzes
Completion badges
Progress tracking
Additional downloadable content
Additional AI education resources
Notifications when new units are released
Free cloud computing credits

Quiz: 8.3. Introduction to Recurrent Neural Networks (Part 1)

When using a 1-dimensional convolutional neural network for text, we first …

convert a text into an image, that is, representing the text as a 2D grid or matrix.

Incorrect. While this is conceptually possible, this is not a common approach.

convert text into a dense continuous vector.

Correct. There are methods like Word2Vec that do this, or we can use our own embedding layer (more on that in a later video in Unit 8.3).

Please answer all questions to proceed.

Quiz: 8.3. Introduction to Recurrent Neural Networks (Part 2)

Suppose we are working on sentiment analysis: Given a text sequence, the goal is to determine the sentiment of the text (positive, negative, or neutral). This is what type of task?

one-to-many

Incorrect. Since we feed the network with text (multiple inputs), this is not a “one-to-…” task.

many-to-one

Correct. We have many inputs (words) that are associated with a single label.

many-to-many

Incorrect. Since we only obtain one label per text, this is not a “many” output.

Please answer all questions to proceed.

Quiz: 8.3. Introduction to Recurrent Neural Networks (Part 3)

Suppose we have character-level model where each letter is first represented as a one-hot encoding. The letter “a” is represented as the one-hot encoded vector [1, 0, 0, 0, 0, …]. What is the dimensionality of the continuous vector embedding of each letter?

Incorrect. The embedding size (similar to a hidden layer size) is set by the user. There is not enough information above to say what the embedding size is.

Incorrect. The embedding size (similar to a hidden layer size) is set by the user. There is not enough information above to say what the embedding size is.

Incorrect. The embedding size (similar to a hidden layer size) is set by the user. There is not enough information above to say what the embedding size is.

Impossible to tell based on this information.

Correct. The embedding size (similar to a hidden layer size) is set by the user. There is not enough information above to say what the embedding size is.

Please answer all questions to proceed.

Quiz: 8.3. Introduction to Recurrent Neural Networks (Part 4)

Why might one choose to use an embedding layer instead of a linear layer in an natural language processing (NLP) context?

Embedding layers are less computationally expensive and provide better performance for large-scale NLP tasks.

Correct. Embedding layers are less computationally expensive compared to linear layers when dealing with large-scale NLP tasks.

Embedding layers are more suitable for tasks involving continuous data.

Incorrect. We use linear layers for continuous data all the time. Also, embedding layers are specifically designed for handling discrete data, like words or tokens in NLP tasks, rather than continuous data.

Linear layers are more complex and harder to implement.

Incorrect. Linear layers are just performing a matrix multiplication.

Linear layers are not compatible with PyTorch.

Incorrect. Linear layers are indeed compatible with PyTorch (using torch.nn.Linear)

Please answer all questions to proceed.

Watch Video 1 Mark complete and go to Unit 8.4 →

Unit 8.3

Videos

Follow along in a Lightning Studio

DL Fundamentals 8: Large Language Models

Sebastian

Launch Studio →