Update 2021-3-4-pytorch-1.8-released.md

wookim3 · web-flow · commit c5521ee87383 · 2021-03-05T00:12:42.000-08:00
diff --git a/_posts/2021-3-4-pytorch-1.8-released.md b/_posts/2021-3-4-pytorch-1.8-released.md
@@ -51,7 +51,7 @@ In addition to the major stable and beta distributed training features in this r
 * **(Prototype) ZeroRedundancyOptimizer** - Based on and in partnership with the Microsoft DeepSpeed team, this feature helps reduce per-process memory footprint by sharding optimizer states across all participating processes in the ```ProcessGroup``` gang. Refer to this [documentation](https://pytorch.org/docs/master/distributed.optim.html#torch.distributed.optim.ZeroRedundancyOptimizer) for more details. 
 * **(Prototype) Process Group NCCL Send/Recv** - The NCCL send/recv API was introduced in v2.7 and this feature adds support for it in NCCL process groups. This feature will provide an option for users to implement collective operations at Python layer instead of C++ layer. Refer to this [documentation](https://pytorch.org/docs/master/distributed.html#distributed-communication-package-torch-distributed) and [code examples](https://github.com/pytorch/pytorch/blob/master/torch/distributed/distributed_c10d.py#L899) to learn more.
 * **(Prototype) CUDA-support in RPC using TensorPipe** - This feature should bring consequent speed improvements for users of PyTorch RPC with multiple-GPU machines, as TensorPipe will automatically leverage NVLink when available, and avoid costly copies to and from host memory when exchanging GPU tensors between processes. When not on the same machine, TensorPipe will fall back to copying the tensor to host memory and sending it as a regular CPU tensor. This will also improve the user experience as users will be able to treat GPU tensors like regular CPU tensors in their code. Refer to this [documentation](https://pytorch.org/docs/1.8.0/rpc.html) for more details.
-* **(Prototype) Remote Module** - This feature allows users to operate a module on a remote worker like using a local module, where the RPCs are transparent to the user. In the past, this functionality was implemented in an ad-hoc way and overall this feature will improve the usability of model parallelism on PyTorch. Refer to this [documentation](https://pytorch.org/docs/master/distributed.html#distributed-communication-package-torch-distributed) for more details.
+* **(Prototype) Remote Module** - This feature allows users to operate a module on a remote worker like using a local module, where the RPCs are transparent to the user. In the past, this functionality was implemented in an ad-hoc way and overall this feature will improve the usability of model parallelism on PyTorch. Refer to this [documentation](https://pytorch.org/docs/master/rpc.html#remotemodule) for more details.
 
 # PyTorch Mobile
 Support for PyTorch Mobile is expanding with a new set of tutorials to help new users launch models on-device quicker and give existing users a tool to get more out of our framework. These include:
@@ -114,7 +114,7 @@ for num_threads in [1, 2, 4]:
   * [(Prototype) FX Graph Mode Post Training Dynamic Quantization](https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_dynamic.html)
   * [(Prototype) FX Graph Mode Quantization User Guide](https://pytorch.org/tutorials/prototype/fx_graph_mode_quant_guide.html)
 
-## Hardware Support
+# Hardware Support
 
 ### [Beta] Ability to Extend the PyTorch Dispatcher for a new backend in C++
 In PyTorch 1.8, you can now create new out-of-tree devices that live outside the ```pytorch/pytorch``` repo. The tutorial linked below shows how to register your device and keep it in sync with native PyTorch devices.