Share Files Between Components

Note

The contents of this page is still in progress!

Audience: Users who want to share files between components.


Why do I need distributed storage?

In a Lightning App some components can be executed on their own hardware. Distributed storage enables a file saved by a component on one machine to be used by components in other machines (transparently).

If you’ve asked the question “how do I use the checkpoint from this model to deploy this other thing”, you’ve needed distributed storage.


Write a file

To write a file, first create a reference to the file with the Path class, then write to it:

from lightning.app.storage import Path

# file reference
boring_file_reference = Path("boring_file.txt")

# write to that file
with open(self.boring_file_reference, "w") as f:
    f.write("yolo")

Use a file

To use a file, pass the reference to the file:

f = open(boring_file_reference, "r")
print(f.read())

Example: Share a model checkpoint

A common workflow in ML is to use a checkpoint created by another component. First, define a component that saves a checkpoint:

import os

import torch

from lightning.app import LightningWork, LightningFlow, LightningApp
from lightning.app.storage.path import Path


class ModelTraining(LightningWork):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.checkpoints_path = Path("./checkpoints")

    def run(self):
        # make fake checkpoints
        checkpoint_1 = torch.tensor([0, 1, 2, 3, 4])
        checkpoint_2 = torch.tensor([0, 1, 2, 3, 4])
        os.makedirs(self.checkpoints_path, exist_ok=True)
        checkpoint_path = str(self.checkpoints_path / "checkpoint_{}.ckpt")

Next, define a component that needs the checkpoints:

        torch.save(checkpoint_1, str(checkpoint_path).format("1"))
        torch.save(checkpoint_2, str(checkpoint_path).format("2"))


class ModelDeploy(LightningWork):
    def __init__(self, ckpt_path, *args, **kwargs):
        super().__init__()
        self.ckpt_path = ckpt_path

    def run(self):
        ckpts = os.listdir(self.ckpt_path)
        checkpoint_1 = torch.load(os.path.join(self.ckpt_path, ckpts[0]))

Link both components via a parent component:

        checkpoint_2 = torch.load(os.path.join(self.ckpt_path, ckpts[1]))
        print(f"Loaded checkpoint_1: {checkpoint_1}")
        print(f"Loaded checkpoint_2: {checkpoint_2}")


class LitApp(LightningFlow):
    def __init__(self):
        super().__init__()
        self.train = ModelTraining()
        self.deploy = ModelDeploy(ckpt_path=self.train.checkpoints_path)

    def run(self):
        self.train.run()
        self.deploy.run()


app = LightningApp(LitApp())

Run the app above with the following command:

lightning run app docs/source/workflows/share_files_between_components/app.py
Your Lightning App is starting. This won't take long.
INFO: Your app has started. View it in your browser: http://127.0.0.1:7501/view
Loaded checkpoint_1: tensor([0, 1, 2, 3, 4])
Loaded checkpoint_2: tensor([0, 1, 2, 3, 4])

For example, here we save a file on one component and use it in another component:

from lightning.app.storage import Path


class ComponentA(LightningWork):
    def __init__(self):
        super().__init__()
        self.boring_path = None

    def run(self):
        # This should be used as a REFERENCE to the file.
        self.boring_path = Path("boring_file.txt")
        with open(self.boring_path, "w") as f:
            f.write(FILE_CONTENT)