Shortcuts

TracerPythonScript

class lightning_app.components.python.tracer.TracerPythonScript(script_path, script_args=None, outputs=None, env=None, code=None, **kwargs)[source]

Bases: lightning_app.core.work.LightningWork

The TracerPythonScript class enables to easily run a python script.

When subclassing this class, you can configure your own Tracer by configure_tracer() method.

The tracer is quite a magical class. It enables you to inject code into a script execution without changing it.

Parameters
  • script_path (str) – Path of the python script to run.

  • script_path – The arguments to be passed to the script.

  • outputs (Optional[List[str]]) – Collection of object names to collect after the script execution.

  • env (Optional[Dict]) – Environment variables to be passed to the script.

  • kwargs – LightningWork Keyword arguments.

Raises

FileNotFoundError – If the provided script_path doesn’t exists.

How does it work?

It works by executing the python script with python built-in runpy run_path method. This method takes any python globals before executing the script, e.g., you can modify classes or function from the script.

Example

>>> from lightning_app.components.python import TracerPythonScript
>>> f = open("a.py", "w")
>>> f.write("print('Hello World !')")
22
>>> f.close()
>>> python_script = TracerPythonScript("a.py")
>>> python_script.run()
Hello World !
>>> os.remove("a.py")

In the example below, we subclass the TracerPythonScript component and override its configure_tracer method.

Using the Tracer, we are patching the __init__ method of the PyTorch Lightning Trainer. Once the script starts running and if a Trainer is instantiated, the provided pre_fn is called and we inject a Lightning callback.

This callback has a reference to the work and on every batch end, we are capturing the trainer global_step and best_model_path.

Even more interesting, this component works for ANY PyTorch Lightning script and its state can be used in real time in a UI.

from lightning.app.components import TracerPythonScript
from lightning.app.storage import Path
from lightning.app.utilities.tracer import Tracer
from pytorch_lightning import Trainer


class PLTracerPythonScript(TracerPythonScript):
    """This component can be used for ANY PyTorch Lightning script to track its progress and extract its best model
    path."""

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # Define the component state.
        self.global_step = None
        self.best_model_path = None

    def configure_tracer(self) -> Tracer:
        from pytorch_lightning.callbacks import Callback

        class MyInjectedCallback(Callback):
            def __init__(self, lightning_work):
                self.lightning_work = lightning_work

            def on_train_start(self, trainer, pl_module) -> None:
                print("This code doesn't belong to the script but was injected.")
                print("Even the Lightning Work is available and state transfer works !")
                print(self.lightning_work)

            def on_batch_train_end(self, trainer, *_) -> None:
                # On every batch end, collects some information.
                # This is communicated automatically to the rest of the app,
                # so you can track your training in real time in the Lightning App UI.
                self.lightning_work.global_step = trainer.global_step
                best_model_path = trainer.checkpoint_callback.best_model_path
                if best_model_path:
                    self.lightning_work.best_model_path = Path(best_model_path)

        # This hook would be called every time
        # before a Trainer `__init__` method is called.

        def trainer_pre_fn(trainer, *args, **kwargs):
            kwargs["callbacks"] = kwargs.get("callbacks", []) + [MyInjectedCallback(self)]
            return {}, args, kwargs

        tracer = super().configure_tracer()
        tracer.add_traced(Trainer, "__init__", pre_fn=trainer_pre_fn)
        return tracer


if __name__ == "__main__":
    comp = PLTracerPythonScript(Path(__file__).parent / "pl_script.py")
    res = comp.run()

Once implemented, this component can easily be integrated within a larger app to execute a specific python script.

import os
from pathlib import Path

from examples.components.python.component_tracer import PLTracerPythonScript

import lightning as L


class RootFlow(L.LightningFlow):
    def __init__(self):
        super().__init__()
        script_path = Path(__file__).parent / "pl_script.py"
        self.tracer_python_script = PLTracerPythonScript(script_path)

    def run(self):
        assert os.getenv("GLOBAL_RANK", "0") == "0"
        if not self.tracer_python_script.has_started:
            self.tracer_python_script.run()
        if self.tracer_python_script.has_succeeded:
            self.stop("tracer script succeed")
        if self.tracer_python_script.has_failed:
            self.stop("tracer script failed")


app = L.LightningApp(RootFlow())
configure_tracer()[source]

Override this hook to customize your tracer when running PythonScript.

Return type

Tracer

on_after_run(res)[source]

Called after the python script is executed.

on_before_run()[source]

Called before the python script is executed.

on_exit()[source]

Override this hook to add your logic when the work is exiting.

run(params=None, restart_count=None, code_dir='.', **kwargs)[source]
Parameters
  • params (Optional[Dict[str, Any]]) – A dictionary of arguments to be be added to script_args.

  • restart_count (Optional[int]) – Passes an incrementing counter to enable the re-execution of LightningWorks.

  • code_dir (Optional[str]) – A path string determining where the source is extracted, default is current directory.