What is the batch size for distributed training fsdp?

MrTorch · April 23, 2024, 4:26am

Let’s say I setup torch fabric like this:

fabric = Fabric(accelerator=“gpu”, devices=1, num_nodes=1, strategy=“fsdp”)
fabric.launch()

When I specify the batch size in my model, is it the batch per gpu or the batch in total? I am trying to do contrastive learning with a large batch size. If it’s the batch per gpu, how do I retain the loss until the end where I can have a large final batch size?

Topic		Replies	Views
Why `num_replica` != `world_size`? DDP/GPU	0	101	May 22, 2024
Multi GPU computing Fabric	0	251	September 3, 2023
FullyShardedDataParallel no memory decrease DDP/GPU	7	1749	December 8, 2022
What is it exactly that Lightning/Fabric DataLoaders do? DDP/GPU	4	1658	June 8, 2023
Reproduce one GPU score/loss using DDP - Disrepancy DDP/GPU	1	343	January 28, 2024

What is the batch size for distributed training fsdp?

Related topics