Accelerator: HPU Training¶
This document offers instructions to Gaudi chip users who want to use advanced strategies and profiling HPUs.
Using HPUProfiler¶
HPUProfiler is a Lightning implementation of PyTorch profiler for HPU. It aids in obtaining profiling summary of PyTorch functions. It subclasses PyTorch Lightning’s PyTorch profiler.
Default Profiling¶
For auto profiling, create an HPUProfiler
instance and pass it to the trainer.
At the end of profiler.fit()
, it will generate a JSON trace for the run.
In case accelerator= HPUAccelerator()
is not used with HPUProfiler
, it will dump only CPU traces, similar to PyTorchProfiler
.
from lightning import Trainer
from lightning_habana.pytorch.accelerator import HPUAccelerator
from lightning_habana.pytorch.profiler.profiler import HPUProfiler
trainer = Trainer(accelerator=HPUAccelerator(), profiler=HPUProfiler())
Distributed Profiling¶
To profile a distributed model, use HPUProfiler
with the filename argument which will save a report per rank.
from pytorch_lightning import Trainer
from lightning_habana.pytorch.accelerator import HPUAccelerator
from lightning_habana.pytorch.profiler.profiler import HPUProfiler
profiler = HPUProfiler(filename="perf-logs")
trainer = Trainer(profiler=profiler, accelerator=HPUAccelerator())
Custom Profiling¶
To profile custom actions of interest,
reference a profiler in the LightningModule
.
from pytorch_lightning import Trainer
from lightning_habana.pytorch.accelerator import HPUAccelerator
from lightning_habana.pytorch.profiler.profiler import HPUProfiler
# Reference profiler in LightningModule
class MyModel(LightningModule):
def __init__(self, profiler=None):
self.profiler = profiler
# To profile in any part of your code, use the self.profiler.profile() function
def custom_processing_step_basic(self, data):
with self.profiler.profile("my_custom_action"):
print("do somthing")
return data
# Alternatively, use self.profiler.start("my_custom_action")
# and self.profiler.stop("my_custom_action") functions
# to enclose the part of code to be profiled.
def custom_processing_step_granular(self, data):
self.profiler.start("my_custom_action")
print("do somthing")
self.profiler.stop("my_custom_action")
return data
# Pass profiler instance to LightningModule
profiler = HPUProfiler()
model = MyModel(profiler)
trainer = Trainer(accelerator=HPUAccelerator(), profiler=profiler)
For more details on Profiler, refer to PyTorchProfiler
Visualizing Profiled Operations¶
Profiler dumps traces in JSON format. The traces can be visualized in 2 ways as described below.
Using PyTorch TensorBoard Profiler¶
For further instructions see, https://github.com/pytorch/kineto/tree/master/tb_plugin.
Install tensorboard
python -um pip install tensorboard torch-tb-profiler
Start the TensorBoard server (default at port 6006)
tensorboard --logdir ./tensorboard --port 6006
Open the following URL in your browser: http://localhost:6006/#profile.
Using Chrome¶
Open Chrome and paste this URL: chrome://tracing/.
Once tracing opens, click on Load at the top-right and load one of the generated traces.
Limitations¶
When using
HPUProfiler
, wall clock time will not be representative of the true wall clock time. This is due to forcing profiled operations to be measured synchronously, when many HPU ops happen asynchronously. It is recommended to use this Profiler to find bottlenecks/breakdowns, however for end to end wall clock time use theSimpleProfiler
.HPUProfiler.summary()
is not supported.Passing the Profiler name as a string “hpu” to the trainer is not supported.
Using DeepSpeed¶
HPU supports advanced strategies like deepspeed
. By default, HPU training uses 32-bit precision.
To enable mixed precision, set the precision
flag.
from lightning.pytorch.plugins import DeepSpeedPrecisionPlugin
from lightning_habana.pytorch.accelerator import HPUAccelerator
from lightning_habana.pytorch.strategies import HPUDeepSpeedStrategy
trainer = Trainer(devices=8, accelerator=HPUAccelerator(), strategy=HPUDeepSpeedStrategy(), plugins=[DeepSpeedPrecisionPlugin(precision="bf16-mixed")])
For further details on the supported DeepSpeed features and functionalities, refer to Using Deepspeed with HPU.