Lightning AI Studios: Never set up a local environment again →

Log in or create a free Lightning.ai account to track your progress and access additional course materials  

8.1 Working with Text Data

References

What we covered in this video lecture

In this lecture, we are covering different ways to work with text data. This includes going from raw to preprocessed text and converting the preprocessed text into feature vectors for machine learning models.

What type of machine learning models can we use? There are classic models for tabular data like logistic regression and multilayer perceptrons. And then, there are sequence models like 1D convolutional networks and recurrent neural networks. Finally, and most importantly, there are large language transformers, which are now state-of-the-art when it comes to working with text.

Additional resources if you want to learn more

If you want to learn more about Tf-idf approach mentioned in this lecture, I made a walkthrough here: https://nbviewer.org/github/rasbt/pattern_classification/blob/master/machine_learning/scikit-learn/tfidf_scikit-learn.ipynb

Log in or create a free Lightning.ai account to access:

  • Quizzes
  • Completion badges
  • Progress tracking
  • Additional downloadable content
  • Additional AI education resources
  • Notifications when new units are released
  • Free cloud computing credits

Quiz: 8.1 Working with Text Data (Part 1)

Since text data is a sequence of words, a multilayer perceptron trained on text data is considered a sequence model.

Incorrect. A multilayer perceptron considers all features as independent and has no inherent notion of sequence-relationships.

Correct. A multilayer perceptron considers all features as independent and has no inherent notion of sequence-relationships.

Please answer all questions to proceed.

Quiz: 8.1 Working with Text Data (Part 2)

Bag-of-words is a supervised approach that converts text and the associated class labels into feature vectors.

Incorrect. Bag-of-words is a method for encoding text into feature vectors. It does not require class labels and it’s not supervised. (However, a classifier trained on bag-of-words models is a supervised method.)

Correct. Bag-of-words is a method for encoding text into feature vectors. It does not require class labels and it’s not supervised. (However, a classifier trained on bag-of-words models is a supervised method.)

Please answer all questions to proceed.
Watch Video 1

Unit 8.1

Videos