PyTorch Lightning enables working with data from a variety of filesystems, including local filesystems and several cloud storage providers such as S3 on AWS, GCS on Google Cloud, or ADL on Azure.
This applies to saving and writing checkpoints, as well as for logging. Working with different filesystems can be accomplished by appending a protocol like “s3:/” to file paths for writing and reading data.
# `default_root_dir` is the default path used for logs and checkpoints trainer = Trainer(default_root_dir="s3://my_bucket/data/") trainer.fit(model)
You could pass custom paths to loggers for logging data.
from lightning.pytorch.loggers import TensorBoardLogger logger = TensorBoardLogger(save_dir="s3://my_bucket/logs/") trainer = Trainer(logger=logger) trainer.fit(model)
Additionally, you could also resume training with a checkpoint stored at a remote filesystem.
trainer = Trainer(default_root_dir=tmpdir, max_steps=3) trainer.fit(model, ckpt_path="s3://my_bucket/ckpts/classifier.ckpt")
PyTorch Lightning uses fsspec internally to handle all filesystem operations.
The most common filesystems supported by Lightning are:
file://- It’s the default and doesn’t need any protocol to be used. It’s installed by default in Lightning.
s3://- Amazon S3 remote binary store, using the library s3fs. Run
pip install fsspec[s3]to install it.
Google Cloud Storage:
gs://- Google Cloud Storage, using gcsfs. Run
pip install fsspec[gcs]to install it.
Microsoft Azure Storage:
az://- Microsoft Azure Storage, using adlfs. Run
pip install fsspec[adl]to install it.
Hadoop File System:
hdfs://- Hadoop Distributed File System. This uses PyArrow as the backend. Run
pip install fsspec[hdfs]to install it.
You could learn more about the available filesystems with:
from fsspec.registry import known_implementations print(known_implementations)
You could also look into CheckpointIO Plugin for more details on how to customize saving and loading checkpoints.