Introducing Lit-LLaMA: a minimal, optimized rewrite of LLaMA licensed under Apache 2.0 →

Performance Notes Of PyTorch Support for M1 and M2 GPUs

Running PyTorch on the M1 and M2 GPU

In 2020, Apple released the first computers with the new ARM-based M1 chip, which has become known for its great performance and energy efficiency. While it was possible to run deep learning code via PyTorch or PyTorch Lightning on the M1/M2 CPU, PyTorch just recently announced plans to add GPU support for ARM-based Mac processors (M1 & M2).

Some notes about the M1 GPU performance:

  • I noticed that the convolutional networks need much more RAM when running them on a CPU or M1 GPU (compared to a CUDA GPU), and there may be issues regarding swapping. However, I made sure that training the neural networks never exceeded 80% memory utilization on the MacBook Pro.
  • As suggested, not maxing out the batch size on the M1 GPU runs could be another explanation. However, for fairness, I ran all training runs with a batch size of 32 – the 2080Ti and 1080Ti couldn’t handle more due to their limited 11Gb VRAM. Update: I repeated the M1 Pro GPU run with a batch size of 64 and it was approximately 20% faster compared to a batch size of 32.

If you are curious to learn more and see some early benchmarks, check out my article “Running PyTorch on the M1 GPU” here:

By Sebastian Raschka, Lead AI Educator Lightning AI