Update 2020-7-20-pytorch-1.6-released.md

brucejlin1 · web-flow · commit c6cded4800d6 · 2020-07-24T15:57:11.000-07:00
diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md
@@ -4,12 +4,12 @@ title: 'PyTorch 1.6 released w/ Native AMP Support, Microsoft joins as maintaine
 author: Team PyTorch
 ---
 
-Today, we’re announcing the availability of PyTorch 1.6, along with updated domain libraries. We are also excited to announce the team at <Microsoft is now maintaining Windows> builds and binaries and will also be supporting the community on GitHub as well as the PyTorch Windows [discussion forums](https://discuss.pytorch.org/c/windows/).
+Today, we’re announcing the availability of PyTorch 1.6, along with updated domain libraries. We are also excited to announce the team at <Microsoft is now maintaining Windows> builds and binaries and will also be supporting the community on GitHub as well as the PyTorch Windows discussion forums.
 
 The PyTorch 1.6 release includes a number of new APIs, tools for performance improvement and profiling, as well as major updates to both distributed data parallel (DDP) and remote procedure call (RPC) based distributed training. 
 A few of the highlights include: 
 
-1. Automatic mixed precision (AMP) training is now natively supported and a stable feature (See <here> for more details) - thanks for NVIDIA’s contributions; 
+1. Automatic mixed precision (AMP) training is now natively supported and a stable feature (See [here] for more details) - thanks for NVIDIA’s contributions; 
 2. Native TensorPipe support now added for tensor-aware, point-to-point communication primitives built specifically for machine learning; 
 3. Added support for complex tensors to the frontend API surface;
 4. New profiling tools providing tensor-level memory consumption information; and 
@@ -23,9 +23,9 @@ Additionally, from this release onward, features will be classified as Stable, B
 
 AMP allows users to easily enable automatic mixed precision training enabling higher performance and memory savings of up to 50% on Tensor Core GPUs. Using the natively supported `torch.cuda.amp` API, AMP provides convenience methods for mixed precision, where some operations use the `torch.float32 (float)` datatype and other operations use `torch.float16 (half)`. Some ops, like linear layers and convolutions, are much faster in `float16`. Other ops, like reductions, often require the dynamic range of `float32`. Mixed precision tries to match each op to its appropriate datatype.
 
-* Design doc | [Link](https://github.com/pytorch/pytorch/issues/25081)
-* Documentation | [Link](https://pytorch.org/docs/stable/amp.html)
-* Usage examples | [Link](https://pytorch.org/docs/stable/notes/amp_examples.html)
+* Design doc ([Link](https://github.com/pytorch/pytorch/issues/25081))
+* Documentation ([Link](https://pytorch.org/docs/stable/amp.html))
+* Usage examples ([Link](https://pytorch.org/docs/stable/notes/amp_examples.html))
 
 ## [Beta] Fork/Join Parallelism 
 
@@ -34,7 +34,6 @@ This release adds support for a language-level construct as well as runtime supp
 Parallel execution of TorchScript programs is enabled through two primitives: `torch.jit.fork and torch.jit.wait`. In the below example, we parallelize execution of `foo:`
 
 ```python
-
 import torch
 from typing import List
 
@@ -50,7 +49,7 @@ def example(x):
 print(example(torch.ones([])))
  ```
  
- ### * Documentation | [Link](https://pytorch.org/docs/stable/jit.html)
+* Documentation ([Link](https://pytorch.org/docs/stable/jit.html))
 
 ## [Beta] Memory Profiler 
 
@@ -80,10 +79,8 @@ torch::autograd::GraphRoot 691.816us 691.816us 100
 ----------------------------------- --------------- --------------- ---------------
  ```
 
-* Design doc | [Link](https://github.com/pytorch/pytorch/pull/37775) 
-* Documentation | [Link](https://pytorch.org/docs/stable/autograd.html#profiler)
-
-
+* Design doc ([Link](https://github.com/pytorch/pytorch/pull/37775))
+* Documentation ([Link](https://pytorch.org/docs/stable/autograd.html#profiler))
 
 # Distributed Training & RPC 
 
@@ -102,12 +99,12 @@ torch.distributed.rpc.init_rpc(
 torch.distributed.rpc.rpc_sync(...)
 ```
 
-* Design doc | [Link](https://github.com/pytorch/pytorch/issues/35251)
-* Documentation | [Link](https://pytorch.org/docs/stable/rpc/index.html)
+* Design doc ([Link](https://github.com/pytorch/pytorch/issues/35251))
+* Documentation ([Link](https://pytorch.org/docs/stable/rpc/index.html))
 
-## [Beta]  DDP+RPC 
+## [Beta] DDP+RPC 
 
-PyTorch Distributed supports two powerful paradigms: DDP for full sync data parallel training of models and the RPC framework which allows for distributed model parallelism. Currently, these two features work independently and users can’t mix and match these to try out hybrid parallelism paradigms.
+PyTorch Distributed supports two powerful paradigms: DDP for full sync data parallel training of models and the RPC framework which allows for distributed model parallelism. Previously, these two features work independently and users can’t mix and match these to try out hybrid parallelism paradigms.
 
 Starting in PyTorch 1.6, we’ve enabled DDP and RPC to work together seamlessly so that users can combine these two techniques to achieve both data parallelism and model parallelism. An example is where users would like to place large embedding tables on parameter servers and use the RPC framework for embedding lookups, but store smaller dense parameters on trainers and use DDP to synchronize the dense parameters. Below is a simple code snippet. 
 
@@ -124,9 +121,9 @@ for data in batch:
       torch.distributed.autograd.backward([loss])
 ```
 
-* DDP+RPC Tutorial | [Link](https://pytorch.org/tutorials/advanced/rpc_ddp_tutorial.html)
-* Documentation | [Link](https://pytorch.org/docs/stable/rpc/index.html)
-* Usage Examples | [Link](https://github.com/pytorch/examples/pull/800)
+* DDP+RPC Tutorial ([Link](https://pytorch.org/tutorials/advanced/rpc_ddp_tutorial.html))
+* Documentation ([Link](https://pytorch.org/docs/stable/rpc/index.html))
+* Usage Examples ([Link](https://github.com/pytorch/examples/pull/800))
 
 ## [Beta] RPC - Asynchronous User Functions
 
@@ -147,12 +144,11 @@ ret = rpc.rpc_sync(
 )
         
 print(ret)  # prints tensor([3., 3.])
-
 ```
 
-* Tutorial for performant batch RPC using Asynchronous User Functions| Link (https://github.com/pytorch/tutorials/blob/release/1.6/intermediate_source/rpc_async_execution.rst)
-* Documentation | Link (https://pytorch.org/docs/stable/rpc.html#torch.distributed.rpc.functions.async_execution)
-* Usage examples | Link (https://github.com/pytorch/examples/tree/stable/distributed/rpc/batch)
+* Tutorial for performant batch RPC using Asynchronous User Functions ([Link](https://github.com/pytorch/tutorials/blob/release/1.6/intermediate_source/rpc_async_execution.rst))
+* Documentation ([Link](https://pytorch.org/docs/stable/rpc.html#torch.distributed.rpc.functions.async_execution))
+* Usage examples ([Link](https://github.com/pytorch/examples/tree/stable/distributed/rpc/batch))
 
 # Frontend API Updates
 
@@ -164,15 +160,15 @@ The PyTorch 1.6 release brings beta level support for complex tensors including
 
 ## torchvision 0.7 
 
-torchvision 0.7 introduces two new pretrained semantic segmentation models, FCN ResNet50 (https://arxiv.org/abs/1411.4038) and DeepLabV3 ResNet50 (https://arxiv.org/abs/1706.05587), both trained on COCO and using smaller memory footprints than the ResNet101 backbone. We also introduced support for AMP (Automatic Mixed Precision) autocasting for torchvision models and operators, which automatically selects the floating point precision for different GPU operations to improve performance while maintaining accuracy. 
+torchvision 0.7 introduces two new pretrained semantic segmentation models, [FCN ResNet50](https://arxiv.org/abs/1411.4038) and [DeepLabV3 ResNet50](https://arxiv.org/abs/1706.05587), both trained on COCO and using smaller memory footprints than the ResNet101 backbone. We also introduced support for AMP (Automatic Mixed Precision) autocasting for torchvision models and operators, which automatically selects the floating point precision for different GPU operations to improve performance while maintaining accuracy. 
 
-* Release notes | [Link](https://github.com/pytorch/vision/releases)
+* Release notes ([Link](https://github.com/pytorch/vision/releases))
 
 ## torchaudio 0.6
 
 torchaudio now officially supports Windows. This release also introduces a new model module (with wav2letter included), new functionals (contrast, cvm, dcshift, overdrive, vad, phaser, flanger, biquad), datasets (GTZAN, CMU), and a new optional sox backend with support for torchscript.
 
-* Release notes | [Link](https://github.com/pytorch/audio/releases)
+* Release notes ([Link](https://github.com/pytorch/audio/releases))
 
 # Additional updates
 
@@ -192,7 +188,7 @@ This is a great opportunity to connect with the community and practice your mach
 
 ## LPCV Challenge
 
-The [2020 CVPR Low-Power Vision Challenge (LPCV) - Online Track for UAV video](https://lpcv.ai/2020CVPR/video-track) submission deadline coming up shortly. You have until July 31, 2020 to build a system that can discover and recognize characters in video captured by an unmanned aerial vehicle (UAV) accurately using PyTorch and Raspberry Pi 3B+. 
+The [2020 CVPR Low-Power Vision Challenge (LPCV) - Online Track for UAV video](https://lpcv.ai/2020CVPR/video-track) submission deadline is coming up shortly. You have until July 31, 2020 to build a system that can discover and recognize characters in video captured by an unmanned aerial vehicle (UAV) accurately using PyTorch and Raspberry Pi 3B+. 
 
 ## Prototype Features
 
@@ -208,34 +204,5 @@ To reiterate, prototype features in PyTorch are early features that we are looki
 
 
 Cheers!
-Team PyTorch
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
 
+Team PyTorch