Lightning AI Studios: Never set up a local environment again →

Log in or create a free account to track your progress and access additional course materials  

3.7 Feature Normalization (Parts 1-2)



What we covered in this video lecture

When we work with real-world datasets, features often come in different scales. For example, think of the alcohol content of wine (when measured in percent, it’s typically a value between 10 and 15). On the other hand, the proline content of wine is measured in mg/L (milligram per Liter) and can be 100 times larger, typically ranging between 300 and 1300.

Working with features that have vastly different numeric scales can often result in suboptimal training. To make it easier to find a good learning rate and get good convergence (that means, successfully minimizing the loss), feature normalization can help.

This lecture covered two of the most widely used feature normalization methods: min-max scaling and standardization (also known as z-score standardization). In certain scenarios, one normalization scheme might work slightly better than another. Still, the most important lesson is that we use a normalization scheme to ensure that the features are all roughly on the same scale.

Additional resources if you want to learn more

We mentioned using the training set parameters to normalize the test set. The rationale behind it can be confusing at first and may take some time to make sense. In this context, you might find my short explanation Why do we need to reuse training parameters to transform test data? helpful.

Log in or create a free account to access:

  • Quizzes
  • Completion badges
  • Progress tracking
  • Additional downloadable content
  • Additional AI education resources
  • Notifications when new units are released
  • Free cloud computing credits

Quiz: 3.7 Feature Normalization - PART 1

What are the advantages of feature normalization?

Incorrect. Feature scaling does not affect the computational speed for prediction.

Correct. The gradient updates will be more stable, which will help the model to achieve a better predictive performance.

Correct. The model typically converges faster when the features are normalized.

Please answer all questions to proceed.

Quiz: 3.7 Feature Normalization - PART 2

When we use standardization (z-score normalization), each feature in the training and test set will have a mean of 0 and a standard deviation of 1.

Incorrect. Each training feature will match these properties, but that’s not guaranteed for the test set. Or, imagine you collect a new sample that matches the extreme values of your training set distribution, would you expect it to have exactly the same distribution as the training set?

Correct. Each training feature will match these properties, but that’s not guaranteed for the test set.

Please answer all questions to proceed.
Watch Video 1

Unit 3.7