I have been creating a wrapper for CLIP using pytorch-lightning in order to finetune CLIP on Image Remote Sesing Captioning. I have been using LightningCLI to handle the argument parsing for the model, the trainer etc.
I am finetuning CLIP using RSICD dataset.
When I start the training, something strange happens and I can’t seem to get my head around it. The training epochs and validation phases go well until the end of epoch 6. When epoch 6 finishes and a new validation phase is due to start, the following excpetion is rasied:
KeyError: 'No action for destination key "trainer.devices" to set its default.'
I have been starting the training with the following prompt python train_finetune_cli.py fit --config config.yaml
.
Here’s my config.yaml file:
# lightning.pytorch==2.0.4
seed_everything: 42
trainer:
accelerator: cuda
strategy: auto
devices: auto
num_nodes: 1
precision: 32
logger: true
callbacks:
- class_path: lightning.pytorch.callbacks.EarlyStopping
init_args:
monitor: val_loss
min_delta: 0.0
patience: 5
verbose: true
mode: min
strict: true
check_finite: true
stopping_threshold: null
divergence_threshold: null
check_on_train_epoch_end: null
log_rank_zero_only: false
- class_path: lightning.pytorch.callbacks.ModelCheckpoint
init_args:
dirpath: null
filename: clip-rsicd-{epoch:02d}-{val_loss:.2f}
monitor: val_loss
verbose: true
save_last: null
save_top_k: 2
save_weights_only: false
mode: min
auto_insert_metric_name: true
every_n_train_steps: null
train_time_interval: null
every_n_epochs: null
save_on_train_epoch_end: true
fast_dev_run: false
max_epochs: 32
min_epochs: null
max_steps: -1
min_steps: null
max_time: null
limit_train_batches: null
limit_val_batches: null
limit_test_batches: null
limit_predict_batches: null
overfit_batches: 0.0
val_check_interval: null
check_val_every_n_epoch: 1
num_sanity_val_steps: null
log_every_n_steps: null
enable_checkpointing: null
enable_progress_bar: null
enable_model_summary: null
accumulate_grad_batches: 1
gradient_clip_val: null
gradient_clip_algorithm: null
deterministic: null
benchmark: null
inference_mode: true
use_distributed_sampler: true
profiler: null
detect_anomaly: false
barebones: false
plugins: null
sync_batchnorm: false
reload_dataloaders_every_n_epochs: 0
default_root_dir: null
model:
model: openai/clip-vit-base-patch32
minibatch_size: 32
kl_coeff: 1.0
lr: null
warmup_steps: 0
betas:
- 0.9
- 0.99
weight_decay: 0.2
data:
annotations_file: ./data/RSICD/dataset_rsicd.json
img_dir: ./data/RSICD/RSICD_images
img_transform: null
target_transform: null
train_split_percentage: 80.0
val_split_percentage: 10.0
batch_size: 32
num_workers: 0
shuffle: false
processor: null
Machine specs :
CPU: Intel Xeon(R) CPU E5-2630 v4 @ 2.20GHz
GPU: GeForce GTX 1080 8GB
OS: Ubuntu 22.04.2 LTS
Environment :
pytorch-lightning: 2.0.4
cuda: 11.8
Hi @angelonazzaro
I created a GH issue for you here:
opened 09:02PM - 04 Jul 23 UTC
bug
lightningcli
ver: 2.0.x
### Bug description
Copied over from the [Lightning forum post:](https://ligh… tning.ai/forums/t/keyerror-no-action-for-destination-key-trainer-devices-to-set-its-default/3162)
> I have been creating a wrapper for CLIP using pytorch-lightning in order to finetune CLIP on Image Remote Sesing Captioning. I have been using LightningCLI to handle the argument parsing for the model, the trainer etc.
>
> I am finetuning CLIP using RSICD dataset.
>
> When I start the training, something strange happens and I can’t seem to get my head around it. The training epochs and validation phases go well until the end of epoch 6. When epoch 6 finishes and a new validation phase is due to start, the following excpetion is rasied:
```
KeyError: 'No action for destination key "trainer.devices" to set its default.'
```
> I have been starting the training with the following prompt `python train_finetune_cli.py fit --config config.yaml`.
>
> Here’s my config.yaml file:
```
# lightning.pytorch==2.0.4
seed_everything: 42
trainer:
accelerator: cuda
strategy: auto
devices: auto
num_nodes: 1
precision: 32
logger: true
callbacks:
- class_path: lightning.pytorch.callbacks.EarlyStopping
init_args:
monitor: val_loss
min_delta: 0.0
patience: 5
verbose: true
mode: min
strict: true
check_finite: true
stopping_threshold: null
divergence_threshold: null
check_on_train_epoch_end: null
log_rank_zero_only: false
- class_path: lightning.pytorch.callbacks.ModelCheckpoint
init_args:
dirpath: null
filename: clip-rsicd-{epoch:02d}-{val_loss:.2f}
monitor: val_loss
verbose: true
save_last: null
save_top_k: 2
save_weights_only: false
mode: min
auto_insert_metric_name: true
every_n_train_steps: null
train_time_interval: null
every_n_epochs: null
save_on_train_epoch_end: true
fast_dev_run: false
max_epochs: 32
min_epochs: null
max_steps: -1
min_steps: null
max_time: null
limit_train_batches: null
limit_val_batches: null
limit_test_batches: null
limit_predict_batches: null
overfit_batches: 0.0
val_check_interval: null
check_val_every_n_epoch: 1
num_sanity_val_steps: null
log_every_n_steps: null
enable_checkpointing: null
enable_progress_bar: null
enable_model_summary: null
accumulate_grad_batches: 1
gradient_clip_val: null
gradient_clip_algorithm: null
deterministic: null
benchmark: null
inference_mode: true
use_distributed_sampler: true
profiler: null
detect_anomaly: false
barebones: false
plugins: null
sync_batchnorm: false
reload_dataloaders_every_n_epochs: 0
default_root_dir: null
model:
model: openai/clip-vit-base-patch32
minibatch_size: 32
kl_coeff: 1.0
lr: null
warmup_steps: 0
betas:
- 0.9
- 0.99
weight_decay: 0.2
data:
annotations_file: ./data/RSICD/dataset_rsicd.json
img_dir: ./data/RSICD/RSICD_images
img_transform: null
target_transform: null
train_split_percentage: 80.0
val_split_percentage: 10.0
batch_size: 32
num_workers: 0
shuffle: false
processor: null
```
### What version are you seeing the problem on?
v2.0
### Error messages and logs
```
KeyError: 'No action for destination key "trainer.devices" to set its default.'
```
### Environment
Machine specs:
CPU: Intel Xeon(R) CPU E5-2630 v4 @ 2.20GHz
GPU: GeForce GTX 1080 8GB
OS: Ubuntu 22.04.2 LTS
Environment:
pytorch-lightning: 2.0.4
cuda: 11.8
### More info
This was originally posted in the Lightning forum: https://lightning.ai/forums/t/keyerror-no-action-for-destination-key-trainer-devices-to-set-its-default/3162