.. _hpu: Habana Gaudi AI Processor (HPU) =============================== Lightning supports `Habana Gaudi AI Processor (HPU) `__, for accelerating Deep Learning training workloads. HPU Terminology --------------- Habana® Gaudi® AI training processors are built on a heterogeneous architecture with a cluster of fully programmable Tensor Processing Cores (TPC) along with its associated development tools and libraries, and a configurable Matrix Math engine. The TPC core is a VLIW SIMD processor with an instruction set and hardware tailored to serve training workloads efficiently. The Gaudi memory architecture includes on-die SRAM and local memories in each TPC and, Gaudi is the first DL training processor that has integrated RDMA over Converged Ethernet (RoCE v2) engines on-chip. On the software side, the PyTorch Habana bridge interfaces between the framework and SynapseAI software stack to enable the execution of deep learning models on the Habana Gaudi device. Gaudi offers a substantial price/performance advantage -- so you get to do more deep learning training while spending less. For more information, check out `Gaudi Architecture `__ and `Gaudi Developer Docs `__. How to access HPUs ------------------ To use HPUs, you must have access to a system with HPU devices. You can either use `Gaudi-based AWS EC2 DL1 instances `__ or `Supermicro X12 Gaudi server `__ to get access to HPUs. Check out the `Getting Started Guide with AWS and Habana `__. Training with HPUs ------------------ To enable PyTorch Lightning to utilize the HPU accelerator, simply provide ``accelerator="hpu"`` parameter to the Trainer class. .. code-block:: python trainer = Trainer(accelerator="hpu") Passing ``devices=1`` and ``accelerator="hpu"`` to the Trainer class enables the Habana accelerator for single Gaudi training. .. code-block:: python trainer = Trainer(devices=1, accelerator="hpu") The ``devices=8`` and ``accelerator="hpu"`` parameters to the Trainer class enables the Habana accelerator for distributed training with 8 Gaudis. It uses :class:`~pytorch_lightning.strategies.hpu_parallel.HPUParallelStrategy` internally which is based on DDP strategy with the addition of Habana's collective communication library (HCCL) to support scale-up within a node and scale-out across multiple nodes. .. code-block:: python trainer = Trainer(devices=8, accelerator="hpu") .. note:: If the ``devices`` flag is not defined, it will assume ``devices`` to be ``"auto"`` and select 8 Gaudi devices for :class:`~pytorch_lightning.accelerators.hpu.HPUAccelerator`. Mixed Precision Plugin ---------------------- Lightning also allows mixed precision training with HPUs. By default, HPU training will use 32-bit precision. To enable mixed precision, set the ``precision`` flag. .. code-block:: python trainer = Trainer(devices=1, accelerator="hpu", precision=16) Enabling Mixed Precision Options -------------------------------- Internally, :class:`~pytorch_lightning.plugins.precision.hpu.HPUPrecisionPlugin` uses the Habana Mixed Precision (HMP) package to enable mixed precision training. You can execute the ops in FP32 or BF16 precision. The HMP package modifies the Python operators to add the appropriate cast operations for the arguments before execution. The default settings enable users to enable mixed precision training with minimal code easily. In addition to the default settings in HMP, users also have the option of overriding these defaults and providing their BF16 and FP32 operator lists by passing them as parameter to :class:`~pytorch_lightning.plugins.precision.hpu.HPUPrecisionPlugin`. The below snippet shows an example model using MNIST with a single Habana Gaudi device and making use of HMP by overriding the default parameters. This enables advanced users to provide their own BF16 and FP32 operator list instead of using the HMP defaults. .. code-block:: python import pytorch_lightning as pl from pytorch_lightning.plugins import HPUPrecisionPlugin # Initialize a trainer with HPU accelerator for HPU strategy for single device, # with mixed precision using overidden HMP settings trainer = pl.Trainer( accelerator="hpu", devices=1, # Optional Habana mixed precision params to be set # Checkout `pl_examples/hpu_examples/simple_mnist/ops_bf16_mnist.txt` for the format plugins=[ HPUPrecisionPlugin( precision=16, opt_level="O1", verbose=False, bf16_file_path="ops_bf16_mnist.txt", fp32_file_path="ops_fp32_mnist.txt", ) ], ) # Init our model model = LitClassifier() # Init the data dm = MNISTDataModule(batch_size=batch_size) # Train the model ⚡ trainer.fit(model, datamodule=dm) For more details, please refer to `PyTorch Mixed Precision Training on Gaudi `__. ---------------- .. _known-limitations_hpu: Known limitations ----------------- * Multiple optimizers are not supported. * `Habana dataloader `__ is not supported. * :class:`~pytorch_lightning.callbacks.device_stats_monitor.DeviceStatsMonitor` is not supported.