Lightning AI Studios: Never set up a local environment again →

Log in or create a free Lightning.ai account to track your progress and access additional course materials  

Unit 6.4 – Choosing Activation Functions

References

What we covered in this video lecture

In Unit 4, we learned that non-linear activation functions are essential elements of a multi-layer neural network. In this lecture, we expand our repertoire of non-linear activation functions, including ReLU, GELU, Swish, and Mish activations.

Does it matter which one you choose? Yes, sometimes it can matter. The choice of the activation function may impact the predictive performance, training time, and stability of your deep learning model. Different activation functions may work better for specific tasks and model architectures. The best way to determine which activation function to use is through experimentation. Try different activation functions and evaluate their performance on your specific problem to find the one that best suits our needs.

Additional resources if you want to learn more

If you are interested in reading more about the performance of the different activation functions, you might like to read the A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning article.

Log in or create a free Lightning.ai account to access:

  • Quizzes
  • Completion badges
  • Progress tracking
  • Additional downloadable content
  • Additional AI education resources
  • Notifications when new units are released
  • Free cloud computing credits

Quiz: 6.4 Choosing Activation Functions

The largest Sigmoid derivative we can get during backpropagation is …

Incorrect. Hint: The derivative of the sigmoid function $\sigma(z)$ is $\sigma(z) (1-\sigma(z)) $.

Incorrect. Hint: The derivative of the sigmoid function $\sigma(z)$ is $\sigma(z) (1-\sigma(z)) $.

Incorrect. Hint: The derivative of the sigmoid function $\sigma(z)$ is $\sigma(z) (1-\sigma(z)) $.

Correct. The derivative of the sigmoid function $\sigma(z)$ is $\sigma(z) (1-\sigma(z)) $, and it’s highest if we plug in $z=0.5$.

Incorrect. Hint: The derivative of the sigmoid function $\sigma(z)$ is $\sigma(z) (1-\sigma(z)) $.

Please answer all questions to proceed.
Watch Video 1

Unit 6.4

Videos