Weird behavior in lightning logging
|
|
3
|
17
|
February 6, 2023
|
Re-saving last.bin after resuming from checkpoint last.bin
|
|
0
|
17
|
February 3, 2023
|
How to create a checkpoint that detects whether gradients explode, and if so, rolls back to the last checkpoint and resets the optimizer?
|
|
3
|
28
|
January 31, 2023
|
Error loading model from from checkpoint
|
|
1
|
26
|
January 31, 2023
|
Question about auto_lr_find()
|
|
1
|
1314
|
January 31, 2023
|
Rank_zero_only Callback in ddp
|
|
2
|
34
|
January 30, 2023
|
Steps vs Iteration in Training
|
|
1
|
38
|
January 27, 2023
|
My Training Loss and Validation loss are correct but my validation loss is exploding
|
|
4
|
1765
|
January 26, 2023
|
How to update the dataloader every epoch? train_dataloader() is just called once
|
|
0
|
23
|
January 25, 2023
|
Can Lightning model be accelerated with TensorRT?
|
|
0
|
33
|
January 25, 2023
|
Saving a LightningModule without a Trainer
|
|
0
|
24
|
January 24, 2023
|
How to customize progress bar in test mode?
|
|
0
|
25
|
January 24, 2023
|
How do I prevent initial validation run in Trainer 1.9.0?
|
|
1
|
25
|
January 24, 2023
|
Multi-GPU, TorchMetrics, incorrect aggregation
|
|
0
|
28
|
January 24, 2023
|
How to keep track of training time in DDP setting?
|
|
5
|
52
|
January 23, 2023
|
LightningModule load_from_checkpoint vs Trainer resume_from_checkpoint
|
|
1
|
36
|
January 23, 2023
|
Save_last and monitor in ModelCheckpoint
|
|
0
|
24
|
January 23, 2023
|
DistributedDataParallel multi GPU barely faster than single GPU
|
|
1
|
51
|
January 20, 2023
|
Error Loading a saved checkpoint
|
|
4
|
117
|
January 20, 2023
|
`self.log` raised error when number of dataloader is not consistent
|
|
2
|
38
|
January 19, 2023
|
Using pytorch_lightning from lightning import
|
|
1
|
41
|
January 18, 2023
|
Multi-GPU training issue - DDP strategy. Training hangs upon distributed GPU initialisation
|
|
3
|
151
|
January 18, 2023
|
StochasticWeightAveraging validation logging and checkpoints
|
|
2
|
96
|
January 16, 2023
|
How to implement SWA?
|
|
1
|
353
|
January 16, 2023
|
Why `precision=16` for me is almost useless for speeding up?
|
|
1
|
384
|
January 16, 2023
|
Plot and Images not logged with WandbLogger outside LightningModule
|
|
1
|
42
|
January 16, 2023
|
lr_scheduler.OneCycleLR "ValueError: Tried to step X+2 times. The specified number of total steps is X."
|
|
8
|
3642
|
January 13, 2023
|
Limit the vocabulary for auto-regressive decoder (such as BART or GPT) in next token prediction?
|
|
4
|
65
|
January 12, 2023
|
Saving and loading optimizer state
|
|
2
|
1243
|
January 12, 2023
|
How to visualize transforms in preprocessing before training starts
|
|
2
|
41
|
January 11, 2023
|