Skip to content

Commit 67e087a

Browse files
author
Svetlana Karslioglu
committed
Update
1 parent 0ebdf09 commit 67e087a

File tree

1 file changed

+32
-22
lines changed

1 file changed

+32
-22
lines changed

distributed/home.rst

Lines changed: 32 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ PyTorch with each method having their advantages in certain use cases:
1414
* `DistributedDataParallel (DDP) <#learn-ddp>`__
1515
* `Fully Sharded Data Parallel (FSDP) <#learn-fsdp>`__
1616
* `Remote Procedure Call (RPC) distributed training <#learn-rpc>`__
17-
* `Pipeline Parallelism <#learn-pipeline-parallelism>`__
17+
* `Custom Extensions <#custom-extensions>`__
1818

19-
Read more about these options in [Distributed Overview](../beginner/dist_overview.rst).
19+
Read more about these options in `Distributed Overview <../beginner/dist_overview.html>`__.
2020

2121
.. _learn-ddp:
2222

@@ -47,16 +47,23 @@ Learn DDP
4747
+++
4848
:octicon:`code;1em` Code
4949

50+
.. grid-item-card:: :octicon:`file-code;1em`
51+
Distributed Training with Uneven Inputs Using
52+
the Join Context Manager
53+
:shadow: none
54+
:link: ../advanced_source/generic_join.rst
55+
:link-type: url
56+
57+
This tutorial provides a short and gentle intro to the PyTorch
58+
DistributedData Parallel.
59+
+++
60+
:octicon:`code;1em` Code
61+
5062
.. _learn-fsdp:
5163

5264
Learn FSDP
5365
----------
5466

55-
Fully-Sharded Data Parallel (FSDP) is a tool that distributes model
56-
parameters across multiple workers, therefore enabling you to train larger
57-
models.
58-
59-
6067
.. grid:: 3
6168

6269
.. grid-item-card:: :octicon:`file-code;1em`
@@ -86,9 +93,6 @@ models.
8693
Learn RPC
8794
---------
8895

89-
Distributed Remote Procedure Call (RPC) framework provides
90-
mechanisms for multi-machine model training
91-
9296
.. grid:: 3
9397

9498
.. grid-item-card:: :octicon:`file-code;1em`
@@ -114,38 +118,44 @@ mechanisms for multi-machine model training
114118
:octicon:`code;1em` Code
115119

116120
.. grid-item-card:: :octicon:`file-code;1em`
117-
Distributed Pipeline Parallelism Using RPC
121+
Implementing Batch RPC Processing Using Asynchronous Executions
118122
:shadow: none
119123
:link: https://example.com
120124
:link-type: url
121125

122-
Learn how to use a Resnet50 model for distributed pipeline parallelism
123-
with the Distributed RPC APIs.
126+
In this tutorial you will build batch-processing RPC applications
127+
with the @rpc.functions.async_execution decorator.
124128
+++
125129
:octicon:`code;1em` Code
126130

127131
.. grid:: 3
128132

129133
.. grid-item-card:: :octicon:`file-code;1em`
130-
Implementing Batch RPC Processing Using Asynchronous Executions
134+
Combining Distributed DataParallel with Distributed RPC Framework
131135
:shadow: none
132136
:link: https://example.com
133137
:link-type: url
134138

135-
In this tutorial you will build batch-processing RPC applications
136-
with the @rpc.functions.async_execution decorator.
139+
In this tutorial you will learn how to combine distributed data
140+
parallelism with distributed model parallelism.
137141
+++
138142
:octicon:`code;1em` Code
139143

144+
.. _custom-extensions:
145+
146+
Custom Extensions
147+
-----------------
148+
149+
.. grid:: 3
150+
140151
.. grid-item-card:: :octicon:`file-code;1em`
141-
Combining Distributed DataParallel with Distributed RPC Framework
152+
Customize Process Group Backends Using Cpp Extensions
142153
:shadow: none
143-
:link: https://example.com
154+
:link: intermediate/process_group_cpp_extension_tutorial.html
144155
:link-type: url
145156

146-
In this tutorial you will learn how to combine distributed data
147-
parallelism with distributed model parallelism.
157+
In this tutorial you will learn to implement a custom `ProcessGroup`
158+
backend and plug that into PyTorch distributed package using
159+
cpp extensions.
148160
+++
149161
:octicon:`code;1em` Code
150-
151-
.. _learn-pipeline-parallelism:

0 commit comments

Comments
 (0)