Accelerator: HPU training
Audience: Users looking to save money and run large models faster using single or multiple Gaudi devices.
What is an HPU?
Habana® Gaudi® AI Processor (HPU) training processors are built on a heterogeneous architecture with a cluster of fully programmable Tensor Processing Cores (TPC) along with its associated development tools and libraries, and a configurable Matrix Math engine.
The TPC core is a VLIW SIMD processor with an instruction set and hardware tailored to serve training workloads efficiently. The Gaudi memory architecture includes on-die SRAM and local memories in each TPC and, Gaudi is the first DL training processor that has integrated RDMA over Converged Ethernet (RoCE v2) engines on-chip.
On the software side, the PyTorch Habana bridge interfaces between the framework and SynapseAI software stack to enable the execution of deep learning models on the Habana Gaudi device.
Gaudi offers a substantial price/performance advantage – so you get to do more deep learning training while spending less.
For more information, check out Gaudi Architecture and Gaudi Developer Docs.
Run on 1 Gaudi
To enable PyTorch Lightning to utilize the HPU accelerator, simply provide accelerator="hpu"
parameter to the Trainer class.
trainer = Trainer(accelerator="hpu", devices=1)
Run on multiple Gaudis
The devices=8
and accelerator="hpu"
parameters to the Trainer class enables the Habana accelerator for distributed training with 8 Gaudis.
It uses HPUParallelStrategy
internally which is based on DDP strategy with the addition of Habana’s collective communication library (HCCL) to support scale-up within a node and scale-out across multiple nodes.
trainer = Trainer(devices=8, accelerator="hpu")
Select Gaudis automatically
Lightning can automatically detect the number of Gaudi devices to run on. This setting is enabled by default if the devices argument is missing.
# equivalent
trainer = Trainer(accelerator="hpu")
trainer = Trainer(accelerator="hpu", devices="auto")
How to access HPUs
To use HPUs, you must have access to a system with HPU devices.
AWS
You can either use Gaudi-based AWS EC2 DL1 instances or Supermicro X12 Gaudi server to get access to HPUs.
Check out the Get Started Guide with AWS and Habana.
Known limitations
Habana dataloader is not supported.
torch.inference_mode()
is not supported