Deep Learning Fundamentals
- Deep Learning Fundamentals
- Unit 1Intro to ML and DL
- Unit 2Using Tensors w/ PyTorch
- Unit 3Model Training in PyTorch
- Unit 3.1Using Logistic Regression for Classification
- Unit 3.2The Logistic Regression Computation Graph
- Unit 3.3Model Training with Stochastic Gradient Descent
- Unit 3.4Automatic Differentiation in PyTorch
- Unit 3.5The PyTorch API
- Unit 3.6Training a Logistic Regression Model in PyTorch
- Unit 3.7 Feature Normalization
- Unit 3 ExercisesUnit 3 Exercies
- Unit 4Training Multilayer Neural Networks Overview
- Unit 4.1Logistic Regression for Multiple Classes
- Unit 4.2Multilayer Neural Networks
- Unit 4.3Training a Multilayer Neural Network in PyTorch
- Unit 4.4Defining Efficient Data Loaders
- Unit 4.5Multilayer Neural Networks for Regression
- Unit 4.6Speeding Up Model Training Using GPUs
- Unit 4 ExercisesUnit 4 Exercises
- Unit 5Organizing Your Code with Lightning
- Unit 5.1 Organizing Your Code with Lightning
- Unit 5.2Training a Multilayer Perceptron using the Lightning Trainer
- Unit 5.3Computing Metrics Efficiently with TorchMetrics
- Unit 5.4Making Code Reproducible
- Unit 5.5Organizing Your Data Loaders with Data Modules
- Unit 5.6The Benefits of Logging Your Model Training
- Unit 5.7Evaluating and Using Models on New Data
- Unit 5.8Add Functionality with Callbacks
- Unit 5 ExercisesUnit 5 Exercises
- Unit 6Essential Deep Learning Tips & Tricks
- Unit 6.1 Model Checkpointing and Early Stopping
- Unit 6.2Learning Rates and Learning Rate Schedulers
- Unit 6.3Using More Advanced Optimization Algorithms
- Unit 6.4Choosing Activation Functions
- Unit 6.5Automating The Hyperparameter Tuning Process
- Unit 6.6Improving Convergence with Batch Normalization
- Unit 6.7Reducing Overfitting With Dropout
- Unit 6.8Debugging Deep Neural Networks
- Unit 6 ExercisesUnit 6 Exercises
- Unit 7Getting Started with Computer Vision
- Unit 7.1Working With Images
- Unit 7.2How Convolutional Neural Networks Work
- Unit 7.3Convolutional Neural Network Architectures
- Unit 7.4Training Convolutional Neural Networks
- Unit 7.5Improving Predictions with Data Augmentation
- Unit 7.6Leveraging Pretrained Models with Transfer Learning
- Unit 7.7Using Unlabeled Data with Self-Supervised
- Unit 7 ExercisesUnit 7 Exercises
- Unit 8Natural Language Processing and Large Language Models
- Unit 8.1Working with Text Data
- Unit 8.2Training A Text Classifier Baseline
- Unit 8.3Introduction to Recurrent Neural Networks
- Unit 8.4From RNNs to the Transformer Architecture
- Unit 8.5Understanding Self-Attention
- Unit 8.6Large Language Models
- Unit 8.7A Large Language Model for Classification
- Unit 8 ExercisesUnit 8 Exercises
- Unit 9Techniques for Speeding Up Model Training
- Unit 10 The Finale: Our Next Steps After AI Model Training
4.2 Multilayer Neural Networks (Part 1-3)
Slides
References
- Multilayer perceptrons can approximate any continuous function: Hornik (1989), Multilayer Feedforward Networks are Universal Approximators , https://www.cs.cmu.edu/~epxing/Class/10715/reading/Kornick_et_al.pdf
What we covered in this video lecture
In this lecture, we discussed the limitations of models we covered earlier in this course: the perceptron and logistic regression models. Multilayer networks help us to overcome these. (If you wonder what limitations we are talking about, you get to answer this in the quiz!)
We then discussed the advantages and disadvantages of designing wide versus deep neural networks. Here, width refers to the number of hidden units in the hidden layers. And depth refers to the number of layers.
Lastly, we also discussed different architecture design considerations, for example, using different (or no) nonlinear activation functions and the importance of random weight initialization.
Additional resources if you want to learn more
If you are interested in learning more about the different activation functions, as teasered in 4.2 Part 2, I recommend this A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning.
Note that it is possible to override PyTorch’s default weight initialization scheme using the following code:
def weights_init(m):
if isinstance(m, torch.nn.Linear):
torch.nn.init.*(m.weight)
torch.nn.init.*(m.bias)
model.apply(weights_init)
The *
above is a placeholder for a weight initialization function in PyTorch. Which weight initialization function should be used depends on the activation function. For example, a common choice for ReLU activations is kaiming
initialization:
nn.init.kaiming_normal_(m.weight.data, nonlinearity='relu')
nn.init.constant_(m.bias.data, 0)
You can find out more about Kaiming initialization in the paper
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun (2015). Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, https://arxiv.org/abs/1502.01852v1.
Log in or create a free Lightning.ai account to access:
- Quizzes
- Completion badges
- Progress tracking
- Additional downloadable content
- Additional AI education resources
- Notifications when new units are released
- Free cloud computing credits
Quiz: 4.2 Multilayer Neural Networks and Why We Need Them (PART 1)
Quiz: 4.2 Multilayer Neural Networks and Why We Need Them (PART 2)
Quiz: 4.2 Multilayer Neural Networks and Why We Need Them (PART 3)
Watch Video 1 Mark complete and go to Unit 4.3 →
Unit 4.2