3.7 Feature Normalization (Parts 1-2)
What we covered in this video lecture
When we work with real-world datasets, features often come in different scales. For example, think of the alcohol content of wine (when measured in percent, it’s typically a value between 10 and 15). On the other hand, the proline content of wine is measured in mg/L (milligram per Liter) and can be 100 times larger, typically ranging between 300 and 1300.
Working with features that have vastly different numeric scales can often result in suboptimal training. To make it easier to find a good learning rate and get good convergence (that means, successfully minimizing the loss), feature normalization can help.
This lecture covered two of the most widely used feature normalization methods: min-max scaling and standardization (also known as z-score standardization). In certain scenarios, one normalization scheme might work slightly better than another. Still, the most important lesson is that we use a normalization scheme to ensure that the features are all roughly on the same scale.
Additional resources if you want to learn more
We mentioned using the training set parameters to normalize the test set. The rationale behind it can be confusing at first and may take some time to make sense. In this context, you might find my short explanation Why do we need to reuse training parameters to transform test data? helpful.
Log in or create a free Lightning.ai account to access:
- Completion badges
- Progress tracking
- Additional downloadable content
- Additional AI education resources
- Notifications when new units are released
- Free cloud computing credits