I’m a PL + wandb user, and I load my checkpoint using
# checkpoint format:
# 'exp/h31qedvw/checkpoints/model-01841-val_loss=0.31116.ckpt'
wandb_id = args.checkpoint.split('/')[1]
logger = WandbLogger(project='wav2lip-syncnet', id=wandb_id, resume='must')
trainer = pl.Trainer(max_steps=hparams.nsteps,
devices=1, accelerator="gpu",
logger=logger, log_every_n_steps=1,
callbacks=[checkpoint_callback])
trainer.fit(model, ckpt_path=args.checkpoint)
It seems ok on resume training, and I have manually modified some hparams like lr
. Note that I have break my last run on epoch 1900, while I load a ckpt on epoch 1841, cuz I save ckpt with top-k metrics. When I watch my logs on wandb
, it seems the I cannot overwrite the previous logs from 1841-1900. Is there anyway to do that?