KeyError: 'No action for destination key "trainer.devices" to set its default.'

I have been creating a wrapper for CLIP using pytorch-lightning in order to finetune CLIP on Image Remote Sesing Captioning. I have been using LightningCLI to handle the argument parsing for the model, the trainer etc.

I am finetuning CLIP using RSICD dataset.

When I start the training, something strange happens and I can’t seem to get my head around it. The training epochs and validation phases go well until the end of epoch 6. When epoch 6 finishes and a new validation phase is due to start, the following excpetion is rasied:

KeyError: 'No action for destination key "trainer.devices" to set its default.'

I have been starting the training with the following prompt python train_finetune_cli.py fit --config config.yaml.

Here’s my config.yaml file:

# lightning.pytorch==2.0.4
seed_everything: 42
trainer:
  accelerator: cuda
  strategy: auto
  devices: auto
  num_nodes: 1
  precision: 32
  logger: true
  callbacks:
  - class_path: lightning.pytorch.callbacks.EarlyStopping
    init_args:
      monitor: val_loss
      min_delta: 0.0
      patience: 5
      verbose: true
      mode: min
      strict: true
      check_finite: true
      stopping_threshold: null
      divergence_threshold: null
      check_on_train_epoch_end: null
      log_rank_zero_only: false
  - class_path: lightning.pytorch.callbacks.ModelCheckpoint
    init_args:
      dirpath: null
      filename: clip-rsicd-{epoch:02d}-{val_loss:.2f}
      monitor: val_loss
      verbose: true
      save_last: null
      save_top_k: 2
      save_weights_only: false
      mode: min
      auto_insert_metric_name: true
      every_n_train_steps: null
      train_time_interval: null
      every_n_epochs: null
      save_on_train_epoch_end: true
  fast_dev_run: false
  max_epochs: 32
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: null
  limit_val_batches: null
  limit_test_batches: null
  limit_predict_batches: null
  overfit_batches: 0.0
  val_check_interval: null
  check_val_every_n_epoch: 1
  num_sanity_val_steps: null
  log_every_n_steps: null
  enable_checkpointing: null
  enable_progress_bar: null
  enable_model_summary: null
  accumulate_grad_batches: 1
  gradient_clip_val: null
  gradient_clip_algorithm: null
  deterministic: null
  benchmark: null
  inference_mode: true
  use_distributed_sampler: true
  profiler: null
  detect_anomaly: false
  barebones: false
  plugins: null
  sync_batchnorm: false
  reload_dataloaders_every_n_epochs: 0
  default_root_dir: null
model:
  model: openai/clip-vit-base-patch32
  minibatch_size: 32
  kl_coeff: 1.0
  lr: null
  warmup_steps: 0
  betas:
  - 0.9
  - 0.99
  weight_decay: 0.2
data:
  annotations_file: ./data/RSICD/dataset_rsicd.json
  img_dir: ./data/RSICD/RSICD_images
  img_transform: null
  target_transform: null
  train_split_percentage: 80.0
  val_split_percentage: 10.0
  batch_size: 32
  num_workers: 0
  shuffle: false
  processor: null

Machine specs:

  • CPU: Intel Xeon(R) CPU E5-2630 v4 @ 2.20GHz
  • GPU: GeForce GTX 1080 8GB
  • OS: Ubuntu 22.04.2 LTS

Environment:
pytorch-lightning: 2.0.4
cuda: 11.8

Hi @angelonazzaro
I created a GH issue for you here: