Lightning AI Studios: Never set up a local environment again →

Log in or create a free Lightning.ai account to track your progress and access additional course materials  

Logistic Regression for Multiple Classes (Part 1-5)

What we covered in this video lecture

In this video, we extended the binary logistic regression model to a multinomial logistic regression model that works with an arbitrary number of classes. In machine learning and deep learning contexts, this multinomial logistic regression model is commonly called softmax regression.

We saw that only minimal code changes required when we turn a logistic regression model into a softmax regression model. We replaced the logistic sigmoid function with a softmax activation function, and we replaced the binary cross-entropy loss by the categorical cross-entropy loss.

Additional resources if you want to learn more

It can sometimes be tricky to remember the correct inputs for the different cross-entropy loss functions in PyTorch, so I created a small cheat sheet here.

Log in or create a free Lightning.ai account to access:

  • Quizzes
  • Completion badges
  • Progress tracking
  • Additional downloadable content
  • Additional AI education resources
  • Notifications when new units are released
  • Free cloud computing credits

Quiz: 4.1 Dealing with More than Two Classes: Softmax Regression (PART 1)

Which of the following are changes we make to the Logistic Regression model to convert it into a Softmax Regression Classifier? We…

Incorrect. Softmax Regression does not have hidden layers.

Correct. We use the softmax function to compute the class-membership probabilities.

Incorrect. Softmax regression and logistic regression operate on the same features.

Incorrect. Softmax Regression does not have hidden layers.

Please answer all questions to proceed.

Quiz: 4.1 Dealing with More than Two Classes: Softmax Regression (PART 2)

For each training example, the softmax function returns one probability membership score for each class. Which of the following statements about the sum of these scores (for one training example) is correct?

Incorrect. The sum of these scores is always the same, independent of the training example.

Correct. Due to the normalization term (denominator), the sum is always 1.

Incorrect. Due to the normalization term, the sum cannot exceed 1.

Please answer all questions to proceed.

Quiz: 4.1 Dealing with More than Two Classes: Softmax Regression (PART 3)

To obtain the class label from the softmax probabilities, we…

Incorrect. We would threshold at 0.5 if we use a logistic sigmoid activation function.

Correct. We would threshold at 0.5 if we use a logistic sigmoid activation function.

Please answer all questions to proceed.

Quiz: 4.1 Dealing with More than Two Classes: Softmax Regression (PART 4)

>Suppose we have a dataset with 5 class labels with labels 0 to 4. What is the correct one-hot representation for a training example with label 3?

Incorrect. This would correspond to label 0.

Incorrect. This would correspond to label 1.

Incorrect. Remember that the class labels start at index 0.

Correct. Since we are starting at index position 0, the 4th entry corresponds to label 3.

Incorrect. This would correspond to label 4.

Please answer all questions to proceed.

Quiz: 4.1 Dealing with More than Two Classes: Softmax Regression (PART 5)

Suppose you have 2 training examples in a 3-class classification setting. What is the cross-entropy loss for a perfectly random prediction?

Incorrect. A perfectly random prediction is 0.33 for a 3-class case. But the output of the loss is not 0.33.

Correct. A perfectly random prediction yields a probability score of 1/3 in a 3-class setting. Then we have -log(1/3) = 1.10. Note that this is independent of the number of training examples, since we average: -(log(1/3) + log(1/3)) / 2 = 1.10

Incorrect. The class-membership probabilities are 0.5 for a perfectly random prediction in the binary case. But we have a 3-class case here, and it’s also not the output of the loss.

Incorrect. A loss of 0 would correspond to the best possible (and not random) scenario.

Incorrect. Infinity would indicate a very wrong, not random prediction.

Incorrect. The cross-entropy loss can’t be negative.

Please answer all questions to proceed.
Watch Video 1

Unit 4.1

Videos