Skip to content

Commit 420037e

Browse files
Fix run_demo(demo_model_parallel, world_size) issue (#2367)
In the function demo_model_parallel, dev0 and dev1 are computed in a way that assigns two distinct GPUs to each process. This is achieved by doubling the rank and applying modulus operation with twice the world_size. Assuming 8 gpus world_size is set to 4, leading to the creation of 4 processes. Each of these processes is allocated two distinct GPUs. For instance, the first process (process 0) is assigned GPUs 0 and 1, the second process (process 1) is assigned GPUs 2 and 3, and so forth.
1 parent 83cbc8d commit 420037e

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

intermediate_source/ddp_tutorial.rst

+3-2
Original file line numberDiff line numberDiff line change
@@ -269,8 +269,8 @@ either the application or the model ``forward()`` method.
269269
setup(rank, world_size)
270270
271271
# setup mp_model and devices for this process
272-
dev0 = (rank * 2) % world_size
273-
dev1 = (rank * 2 + 1) % world_size
272+
dev0 = rank * 2
273+
dev1 = rank * 2 + 1
274274
mp_model = ToyMpModel(dev0, dev1)
275275
ddp_mp_model = DDP(mp_model)
276276
@@ -293,6 +293,7 @@ either the application or the model ``forward()`` method.
293293
world_size = n_gpus
294294
run_demo(demo_basic, world_size)
295295
run_demo(demo_checkpoint, world_size)
296+
world_size = n_gpus//2
296297
run_demo(demo_model_parallel, world_size)
297298
298299
Initialize DDP with torch.distributed.run/torchrun

0 commit comments

Comments
 (0)