Remote Filesystems¶
PyTorch Lightning enables working with data from a variety of filesystems, including local filesystems and several cloud storage providers such as S3 on AWS, GCS on Google Cloud, or ADL on Azure.
This applies to saving and writing checkpoints, as well as for logging. Working with different filesystems can be accomplished by appending a protocol like “s3:/” to file paths for writing and reading data.
# `default_root_dir` is the default path used for logs and checkpoints
trainer = Trainer(default_root_dir="s3://my_bucket/data/")
trainer.fit(model)
You could pass custom paths to loggers for logging data.
from lightning.pytorch.loggers import TensorBoardLogger
logger = TensorBoardLogger(save_dir="s3://my_bucket/logs/")
trainer = Trainer(logger=logger)
trainer.fit(model)
Additionally, you could also resume training with a checkpoint stored at a remote filesystem.
trainer = Trainer(default_root_dir=tmpdir, max_steps=3)
trainer.fit(model, ckpt_path="s3://my_bucket/ckpts/classifier.ckpt")
PyTorch Lightning uses fsspec internally to handle all filesystem operations.
The most common filesystems supported by Lightning are:
Local filesystem:
file://
- It’s the default and doesn’t need any protocol to be used. It’s installed by default in Lightning.Amazon S3:
s3://
- Amazon S3 remote binary store, using the library s3fs. Runpip install fsspec[s3]
to install it.Google Cloud Storage:
gcs://
orgs://
- Google Cloud Storage, using gcsfs. Runpip install fsspec[gcs]
to install it.Microsoft Azure Storage:
adl://
,abfs://
oraz://
- Microsoft Azure Storage, using adlfs. Runpip install fsspec[adl]
to install it.Hadoop File System:
hdfs://
- Hadoop Distributed File System. This uses PyArrow as the backend. Runpip install fsspec[hdfs]
to install it.
You could learn more about the available filesystems with:
from fsspec.registry import known_implementations
print(known_implementations)
You could also look into CheckpointIO Plugin for more details on how to customize saving and loading checkpoints.