Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Model Parallel demo world_size parameter in DDP Tutorial #1750

Closed
sadra-barikbin opened this issue Nov 24, 2021 · 1 comment · Fixed by #2367
Closed

Fix Model Parallel demo world_size parameter in DDP Tutorial #1750

sadra-barikbin opened this issue Nov 24, 2021 · 1 comment · Fixed by #2367
Assignees
Labels
distributed docathon-h1-2023 A label for the docathon in H1 2023 easy

Comments

@sadra-barikbin
Copy link

sadra-barikbin commented Nov 24, 2021

In this tutorial, there are three demos on Distributed Training. The last one is the Model Parallel use case. In the last code block, under the if statement, the world_size parameter of the function call run_demo(demo_model_parallel, world_size) has to be world_size//2 because in the Model Parallel demo, two exclusive GPU s are assigned to every process so there must be half as many processes as GPU s.

In the demo though, we see world_size = n_gpus in the last code block. This assignment is correct for the function calls run_demo(demo_basic, world_size) and run_demo(demo_checkpoint, world_size) but not for the run_demo(demo_model_parallel, world_size)

I propose to edit if statement in the last block to be:

if __name__ == "__main__":
    n_gpus = torch.cuda.device_count()
    assert n_gpus >= 2, f"Requires at least 2 GPUs to run, but got {n_gpus}"
    world_size = n_gpus
    run_demo(demo_basic, world_size)
    run_demo(demo_checkpoint, world_size)
    world_size = n_gpus//2
    run_demo(demo_model_parallel, world_size)

cc @mrshenli @osalpekar @H-Huang @kwen2501

@svekars svekars added easy docathon-h1-2023 A label for the docathon in H1 2023 labels May 31, 2023
@TheMemoryDealer
Copy link
Contributor

TheMemoryDealer commented May 31, 2023

/assigntome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed docathon-h1-2023 A label for the docathon in H1 2023 easy
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants