3.1 Using Logistic Regression for Classification (Parts 1-3)

Slides

Part 1: Single Layer Neural Networks
Part 2: Logistic Sigmoid Function
Part 3: The Logistic Regression Loss

What we covered in this video lecture

We drew a general version of a single-layer neural network in this lecture. Then, we applied it to different models: linear regression, the perceptron from unit 2, and logistic regression. Logistic regression, similar to the perceptron, is a model for binary classification. In fact, many of its concepts, such as the sigmoid activation and the logistic loss, are also used in deep neural networks. So, it’s an important model that we will look at closely in Unit 3.

Additional resources if you want to learn more

In machine learning, we typically refer to the logistic function as sigmoid function due to its sigmoidal (S) shape. However, there are other sigmoid functions that exist. If you are interested in learning about other sigmoid functions, check out this Wikipedia page.

We briefly introduced the logistic loss function, which is also referred to as negative log-likelihood loss or binary cross-entropy. If you are interested, I have written about it in more detail here. (Certain parts of this article involve topics we still need to cover, such as multi-layer neural networks. So, feel free to bookmark this article and revisit it after completing Unit 4.)

Log in or create a free Lightning.ai account to access:

Quizzes
Completion badges
Progress tracking
Additional downloadable content
Additional AI education resources
Notifications when new units are released
Free cloud computing credits

Quiz: 3.1 Using Logistic Regression for Classification - PART 1

Which of the following single-layer neural networks uses a threshold function to generate the predicted class label?

Linear regression

Incorrect. Linear regression produces continuous outputs not class labels.

Logistic regression

Correct. Even though the logistic regression uses a sigmoid activation function, we still need to apply a threshold to obtain the class label.

Perceptron

Correct. As we have already seen in Unit 2, the perceptron uses a threshold function.

Please answer all questions to proceed.

Quiz: 3.1 Using Logistic Regression for Classification - PART 2

Suppose a refers to the activation function value (class-membership probability scores) returned via the logistic sigmoid function. We can obtain a binary class label (0 or 1) as follows:

Return class label 1 if a ≥ 0.5, else return label 0.

Correct. The threshold is typically at 0.5. (*As an advanced concept, it is possible to change the threshold to a different value between 0 and 1 to influence the precision and recall.)

Return class label 1 if a > 0.0, else return label 0.

Incorrect. In this case, the classifier will always predict class label 1. (However, it is valid to use 0.0 as the threshold for the net inputs.)

Return class label 1 if a = 6.0, return 0 if a = -6.0

Incorrect. The threshold should divide the number line into value ranges.

Please answer all questions to proceed.

Quiz: 3.1 Using Logistic Regression for Classification - PART 3

While the perceptron algorithm learns from the predicted class labels, the logistic regression does not use the predicted class labels during the learning (training) phase. Instead, it uses the predicted probabilities to optimize a …

surrogate loss

Correct. A surrogate loss is a general term that refers to a “proxy” loss that is optimized instead of the target evaluation metric (like classification accuracy). We use surrogate loss if we cannot optimize the target metric directly

negative log-likelihood loss

Correct. Is a different term for “Binary cross-entropy”.

binary cross-entropy loss

Correct. Binary cross-entropy is a different term for “negative log-likelihood loss”.

The logistic regression loss function increases … the the farther the predicted probability is from the true target label.

linearly

Incorrect. There is a steeper increase in the loss for wrong predictions.

exponentially

Correct. There is a steep loss increase for wrong predictions; the loss approaches infinity for wrong predictions.

logarithmically

Incorrect. There is a steeper increase in the loss for wrong predictions.

Please answer all questions to proceed.