Skip to content

Commit c5521ee

Browse files
authored
Update 2021-3-4-pytorch-1.8-released.md
1 parent f23c6a6 commit c5521ee

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

_posts/2021-3-4-pytorch-1.8-released.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ In addition to the major stable and beta distributed training features in this r
5151
* **(Prototype) ZeroRedundancyOptimizer** - Based on and in partnership with the Microsoft DeepSpeed team, this feature helps reduce per-process memory footprint by sharding optimizer states across all participating processes in the ```ProcessGroup``` gang. Refer to this [documentation](https://pytorch.org/docs/master/distributed.optim.html#torch.distributed.optim.ZeroRedundancyOptimizer) for more details.
5252
* **(Prototype) Process Group NCCL Send/Recv** - The NCCL send/recv API was introduced in v2.7 and this feature adds support for it in NCCL process groups. This feature will provide an option for users to implement collective operations at Python layer instead of C++ layer. Refer to this [documentation](https://pytorch.org/docs/master/distributed.html#distributed-communication-package-torch-distributed) and [code examples](https://github.com/pytorch/pytorch/blob/master/torch/distributed/distributed_c10d.py#L899) to learn more.
5353
* **(Prototype) CUDA-support in RPC using TensorPipe** - This feature should bring consequent speed improvements for users of PyTorch RPC with multiple-GPU machines, as TensorPipe will automatically leverage NVLink when available, and avoid costly copies to and from host memory when exchanging GPU tensors between processes. When not on the same machine, TensorPipe will fall back to copying the tensor to host memory and sending it as a regular CPU tensor. This will also improve the user experience as users will be able to treat GPU tensors like regular CPU tensors in their code. Refer to this [documentation](https://pytorch.org/docs/1.8.0/rpc.html) for more details.
54-
* **(Prototype) Remote Module** - This feature allows users to operate a module on a remote worker like using a local module, where the RPCs are transparent to the user. In the past, this functionality was implemented in an ad-hoc way and overall this feature will improve the usability of model parallelism on PyTorch. Refer to this [documentation](https://pytorch.org/docs/master/distributed.html#distributed-communication-package-torch-distributed) for more details.
54+
* **(Prototype) Remote Module** - This feature allows users to operate a module on a remote worker like using a local module, where the RPCs are transparent to the user. In the past, this functionality was implemented in an ad-hoc way and overall this feature will improve the usability of model parallelism on PyTorch. Refer to this [documentation](https://pytorch.org/docs/master/rpc.html#remotemodule) for more details.
5555

5656
# PyTorch Mobile
5757
Support for PyTorch Mobile is expanding with a new set of tutorials to help new users launch models on-device quicker and give existing users a tool to get more out of our framework. These include:
@@ -114,7 +114,7 @@ for num_threads in [1, 2, 4]:
114114
* [(Prototype) FX Graph Mode Post Training Dynamic Quantization](https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_dynamic.html)
115115
* [(Prototype) FX Graph Mode Quantization User Guide](https://pytorch.org/tutorials/prototype/fx_graph_mode_quant_guide.html)
116116

117-
## Hardware Support
117+
# Hardware Support
118118

119119
### [Beta] Ability to Extend the PyTorch Dispatcher for a new backend in C++
120120
In PyTorch 1.8, you can now create new out-of-tree devices that live outside the ```pytorch/pytorch``` repo. The tutorial linked below shows how to register your device and keep it in sync with native PyTorch devices.

0 commit comments

Comments
 (0)