Track and Visualize Experiments

Why do I need to track metrics?

In model development, we track values of interest, such as the validation_loss to visualize the learning process for our models. Model development is like driving a car without windows. Charts and logs provide the windows to know where to drive the car.

With Lightning, you can visualize virtually anything you can think of: numbers, text, images, and audio.


Track metrics

Metric visualization is the most basic but powerful way to understand how your model is doing throughout development. To track a metric, add the following:

Step 1: Pick a logger.

from lightning.fabric import Fabric
from lightning.fabric.loggers import TensorBoardLogger

# Pick a logger and add it to Fabric
logger = TensorBoardLogger(root_dir="logs")
fabric = Fabric(loggers=logger)

Loggers you can choose from:


Step 2: Add log() calls in your code.

value = ...  # Python scalar or tensor scalar
fabric.log("some_value", value)

To log multiple metrics at once, use log_dict():

values = {"loss": loss, "acc": acc, "other": other}
fabric.log_dict(values)

View logs dashboard

How you can view the metrics depends on the individual logger you choose. Most have a dashboard that lets you browse everything you log in real time.

For the TensorBoardLogger shown above, you can open it by running

tensorboard --logdir=./logs

If you’re using a notebook environment such as Google Colab or Kaggle or Jupyter, launch TensorBoard with this command

%reload_ext tensorboard
%tensorboard --logdir=./logs

Control logging frequency

Logging a metric in every iteration can slow down the training. Reduce the added overhead by logging less frequently:

for iteration in range(num_iterations):
    if iteration % log_every_n_steps == 0:
        value = ...
        fabric.log("some_value", value)

Use multiple loggers

You can add as many loggers as you want without changing the logging code in your loop.

from lightning.fabric import Fabric
from lightning.fabric.loggers import CSVLogger, TensorBoardLogger

tb_logger = TensorBoardLogger(root_dir="logs/tensorboard")
csv_logger = CSVLogger(root_dir="logs/csv")

# Add multiple loggers in a list
fabric = Fabric(loggers=[tb_logger, csv_logger])

# Calling .log() or .log_dict() always logs to all loggers simultaneously
fabric.log("some_value", value)