:orphan: .. _debugging_basic: ######################## Debug your model (basic) ######################## **Audience**: Users who want to learn the basics of debugging models. .. video:: ../_static/fetched-s3-assets/Trainer+flags+7-+debugging_1.mp4 :poster: ../_static/fetched-s3-assets/thumb_debugging.png :width: 400 :muted: ---- ********************************** How does Lightning help me debug ? ********************************** The Lightning Trainer has *a lot* of arguments devoted to maximizing your debugging productivity. ---- **************** Set a breakpoint **************** A breakpoint stops your code execution so you can inspect variables, etc... and allow your code to execute one line at a time. .. code:: python def function_to_debug(): x = 2 # set breakpoint import pdb pdb.set_trace() y = x**2 In this example, the code will stop before executing the ``y = x**2`` line. ---- ************************************ Run all your model code once quickly ************************************ If you've ever trained a model for days only to crash during validation or testing then this trainer argument is about to become your best friend. The :paramref:`~lightning.pytorch.trainer.trainer.Trainer.fast_dev_run` argument in the trainer runs 5 batch of training, validation, test and prediction data through your trainer to see if there are any bugs: .. code:: python trainer = Trainer(fast_dev_run=True) To change how many batches to use, change the argument to an integer. Here we run 7 batches of each: .. code:: python trainer = Trainer(fast_dev_run=7) .. note:: This argument will disable tuner, checkpoint callbacks, early stopping callbacks, loggers and logger callbacks like :class:`~lightning.pytorch.callbacks.lr_monitor.LearningRateMonitor` and :class:`~lightning.pytorch.callbacks.device_stats_monitor.DeviceStatsMonitor`. ---- ************************ Shorten the epoch length ************************ Sometimes it's helpful to only use a fraction of your training, val, test, or predict data (or a set number of batches). For example, you can use 20% of the training set and 1% of the validation set. On larger datasets like Imagenet, this can help you debug or test a few things faster than waiting for a full epoch. .. testcode:: # use only 10% of training data and 1% of val data trainer = Trainer(limit_train_batches=0.1, limit_val_batches=0.01) # use 10 batches of train and 5 batches of val trainer = Trainer(limit_train_batches=10, limit_val_batches=5) ---- ****************** Run a Sanity Check ****************** Lightning runs **2** steps of validation in the beginning of training. This avoids crashing in the validation loop sometime deep into a lengthy training loop. (See: :paramref:`~lightning.pytorch.trainer.trainer.Trainer.num_sanity_val_steps` argument of :class:`~lightning.pytorch.trainer.trainer.Trainer`) .. testcode:: trainer = Trainer(num_sanity_val_steps=2) ---- ************************************* Print LightningModule weights summary ************************************* Whenever the ``.fit()`` function gets called, the Trainer will print the weights summary for the LightningModule. .. code:: python trainer.fit(...) this generate a table like: .. code-block:: text | Name | Type | Params | Mode ------------------------------------------- 0 | net | Sequential | 132 K | train 1 | net.0 | Linear | 131 K | train 2 | net.1 | BatchNorm1d | 1.0 K | train To add the child modules to the summary add a :class:`~lightning.pytorch.callbacks.model_summary.ModelSummary`: .. testcode:: from lightning.pytorch.callbacks import ModelSummary trainer = Trainer(callbacks=[ModelSummary(max_depth=-1)]) To print the model summary if ``.fit()`` is not called: .. code-block:: python from lightning.pytorch.utilities.model_summary import ModelSummary model = LitModel() summary = ModelSummary(model, max_depth=-1) print(summary) To turn off the autosummary use: .. code:: python trainer = Trainer(enable_model_summary=False) ---- *********************************** Print input output layer dimensions *********************************** Another debugging tool is to display the intermediate input- and output sizes of all your layers by setting the ``example_input_array`` attribute in your LightningModule. .. code-block:: python class LitModel(LightningModule): def __init__(self, *args, **kwargs): self.example_input_array = torch.Tensor(32, 1, 28, 28) With the input array, the summary table will include the input and output layer dimensions: .. code-block:: text | Name | Type | Params | Mode | In sizes | Out sizes ---------------------------------------------------------------------- 0 | net | Sequential | 132 K | train | [10, 256] | [10, 512] 1 | net.0 | Linear | 131 K | train | [10, 256] | [10, 512] 2 | net.1 | BatchNorm1d | 1.0 K | train | [10, 512] | [10, 512] when you call ``.fit()`` on the Trainer. This can help you find bugs in the composition of your layers.