Track and Visualize Experiments¶
Why do I need to track metrics?¶
In model development, we track values of interest, such as the validation_loss to visualize the learning process for our models. Model development is like driving a car without windows. Charts and logs provide the windows to know where to drive the car.
With Lightning, you can visualize virtually anything you can think of: numbers, text, images, and audio.
Track metrics¶
Metric visualization is the most basic but powerful way to understand how your model is doing throughout development. To track a metric, add the following:
Step 1: Pick a logger.
from lightning.fabric import Fabric
from lightning.fabric.loggers import TensorBoardLogger
# Pick a logger and add it to Fabric
logger = TensorBoardLogger(root_dir="logs")
fabric = Fabric(loggers=logger)
Built-in loggers you can choose from:
Step 2: Add log()
calls in your code.
value = ... # Python scalar or tensor scalar
fabric.log("some_value", value)
To log multiple metrics at once, use log_dict()
:
values = {"loss": loss, "acc": acc, "other": other}
fabric.log_dict(values)
View logs dashboard¶
How you can view the metrics depends on the individual logger you choose. Most have a dashboard that lets you browse everything you log in real time.
For the TensorBoardLogger
shown above, you can open it by running
tensorboard --logdir=./logs
If you’re using a notebook environment such as Google Colab or Kaggle or Jupyter, launch TensorBoard with this command
%reload_ext tensorboard
%tensorboard --logdir=./logs
Control logging frequency¶
Logging a metric in every iteration can slow down the training. Reduce the added overhead by logging less frequently:
for iteration in range(num_iterations):
if iteration % log_every_n_steps == 0:
value = ...
fabric.log("some_value", value)
Use multiple loggers¶
You can add as many loggers as you want without changing the logging code in your loop.
from lightning.fabric import Fabric
from lightning.fabric.loggers import CSVLogger, TensorBoardLogger
tb_logger = TensorBoardLogger(root_dir="logs/tensorboard")
csv_logger = CSVLogger(root_dir="logs/csv")
# Add multiple loggers in a list
fabric = Fabric(loggers=[tb_logger, csv_logger])
# Calling .log() or .log_dict() always logs to all loggers simultaneously
fabric.log("some_value", value)