From b371d264db0fac2b7258874aa37528c8d70d9355 Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Mon, 20 Jul 2020 13:48:41 -0700 Subject: [PATCH 01/23] Create 1.6 blog post --- _posts/2020-7-20-pytorch-1.6-released.md | 70 ++++++++++++++++++++++++ 1 file changed, 70 insertions(+) create mode 100644 _posts/2020-7-20-pytorch-1.6-released.md diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md new file mode 100644 index 000000000000..ad6911ed444f --- /dev/null +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -0,0 +1,70 @@ +--- +layout: blog_detail +title: 'PyTorch 1.6 released w/ Native AMP Support, Microsoft joins as maintainers for Windows' +author: Team PyTorch +--- + +Today, we’re announcing the availability of PyTorch 1.6, along with updated domain libraries. We are also excited to announce the team at Microsoft is now maintaining Windows builds and binaries and will also be supporting the community on GitHub as well as the [PyTorch Windows discussion forums](https://discuss.pytorch.org/c/windows/). + +## TUTORIALS HOME PAGE UPDATE +The tutorials home page now provides clear actions that developers can take. For new PyTorch users, there is an easy-to-discover button to take them directly to “A 60 Minute Blitz”. Right next to it, there is a button to view all recipes which are designed to teach specific features quickly with examples. + +
+ +
+ +In addition to the existing left navigation bar, tutorials can now be quickly filtered by multi-select tags. Let’s say you want to view all tutorials related to “Production” and “Quantization”. You can select the “Production” and “Quantization” filters as shown in the image shown below: + +
+ +
+ +The following additional resources can also be found at the bottom of the Tutorials homepage: +* [PyTorch Cheat Sheet](https://pytorch.org/tutorials/beginner/ptcheat.html) +* [PyTorch Examples](https://github.com/pytorch/examples) +* [Tutorial on GitHub](https://github.com/pytorch/tutorials) + +## PYTORCH RECIPES +Recipes are new bite-sized, actionable examples designed to teach researchers and developers how to use specific PyTorch features. Some notable new recipes include: +* [Loading Data in PyTorch](https://pytorch.org/tutorials/recipes/recipes/loading_data_recipe.html) +* [Model Interpretability Using Captum](https://pytorch.org/tutorials/recipes/recipes/Captum_Recipe.html) +* [How to Use TensorBoard](https://pytorch.org/tutorials/recipes/recipes/tensorboard_with_pytorch.html) + +View the full recipes [here](http://pytorch.org/tutorials/recipes/recipes_index.html). + +## LEARNING PYTORCH +This section includes tutorials designed for users new to PyTorch. Based on community feedback, we have made updates to the current [Deep Learning with PyTorch: A 60 Minute Blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) tutorial, one of our most popular tutorials for beginners. Upon completion, one can understand what PyTorch and neural networks are, and be able to build and train a simple image classification network. Updates include adding explanations to clarify output meanings and linking back to where users can read more in the docs, cleaning up confusing syntax errors, and reconstructing and explaining new concepts for easier readability. + +## DEPLOYING MODELS IN PRODUCTION +This section includes tutorials for developers looking to take their PyTorch models to production. The tutorials include: +* [Deploying PyTorch in Python via a REST API with Flask](https://pytorch.org/tutorials/intermediate/flask_rest_api_tutorial.html) +* [Introduction to TorchScript](https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html) +* [Loading a TorchScript Model in C++](https://pytorch.org/tutorials/advanced/cpp_export.html) +* [Exploring a Model from PyTorch to ONNX and Running it using ONNX Runtime](https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html) + +## FRONTEND APIS +PyTorch provides a number of frontend API features that can help developers to code, debug, and validate their models more efficiently. This section includes tutorials that teach what these features are and how to use them. Some tutorials to highlight: +* [Introduction to Named Tensors in PyTorch](https://pytorch.org/tutorials/intermediate/named_tensor_tutorial.html) +* [Using the PyTorch C++ Frontend](https://pytorch.org/tutorials/advanced/cpp_frontend.html) +* [Extending TorchScript with Custom C++ Operators](https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html) +* [Extending TorchScript with Custom C++ Classes](https://pytorch.org/tutorials/advanced/torch_script_custom_classes.html) +* [Autograd in C++ Frontend](https://pytorch.org/tutorials/advanced/cpp_autograd.html) + +## MODEL OPTIMIZATION +Deep learning models often consume large amounts of memory, power, and compute due to their complexity. This section provides tutorials for model optimization: +* [Pruning](https://pytorch.org/tutorials/intermediate/pruning_tutorial.html) +* [Dynamic Quantization on BERT](https://pytorch.org/tutorials/intermediate/dynamic_quantization_bert_tutorial.html) +* [Static Quantization with Eager Mode in PyTorch](https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html) + +## PARALLEL AND DISTRIBUTED TRAINING +PyTorch provides features that can accelerate performance in research and production such as native support for asynchronous execution of collective operations and peer-to-peer communication that is accessible from Python and C++. This section includes tutorials on parallel and distributed training: +* [Single-Machine Model Parallel Best Practices](https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html) +* [Getting started with Distributed Data Parallel](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) +* [Getting started with Distributed RPC Framework](https://pytorch.org/tutorials/intermediate/rpc_tutorial.html) +* [Implementing a Parameter Server Using Distributed RPC Framework](https://pytorch.org/tutorials/intermediate/rpc_param_server_tutorial.html) + +Making these improvements are just the first step of improving PyTorch.org for the community. Please submit your suggestions [here](https://github.com/pytorch/tutorials/pulls). + +Cheers, + +Team PyTorch From 893b22a79ca8d95e7345def5c268554ef4a50dae Mon Sep 17 00:00:00 2001 From: andresruizfacebook <68402331+andresruizfacebook@users.noreply.github.com> Date: Tue, 21 Jul 2020 19:47:31 -0700 Subject: [PATCH 02/23] Update 2020-7-20-pytorch-1.6-released.md All changes except image ( I wasn't able to upload with my permissions) --- _posts/2020-7-20-pytorch-1.6-released.md | 86 ++++++++---------------- 1 file changed, 29 insertions(+), 57 deletions(-) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md index ad6911ed444f..cb9e42fea7da 100644 --- a/_posts/2020-7-20-pytorch-1.6-released.md +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -4,67 +4,39 @@ title: 'PyTorch 1.6 released w/ Native AMP Support, Microsoft joins as maintaine author: Team PyTorch --- -Today, we’re announcing the availability of PyTorch 1.6, along with updated domain libraries. We are also excited to announce the team at Microsoft is now maintaining Windows builds and binaries and will also be supporting the community on GitHub as well as the [PyTorch Windows discussion forums](https://discuss.pytorch.org/c/windows/). +Maxim Lukiyanov - Principal PM at Microsoft, Emad Barsoum - Group EM at Microsoft, Guoliang Hua - Principal EM at Microsoft, Nikita Shulga - Tech Lead at Facebook, Geeta Chauhan - PE Lead at Facebook, Chris Gottbrath - Technical PM at Facebook, Jiachen Pu [(peterjc123)](https://github.com/peterjc123) - Engineer at Facebook + +Along with the PyTorch 1.6 release, we are excited to announce that Microsoft has expanded its participation in the PyTorch community and is taking ownership of the development and maintenance of the PyTorch build for Windows. + +According to the latest [Stack Overflow developer survey](https://insights.stackoverflow.com/survey/2020#technology-developers-primary-operating-systems), Windows remains the primary operating system for the developer community (46% Windows vs 28% MacOS). Jiachen Pu (peterjc123) initially made a heroic effort to add support for PyTorch on Windows, but due to limited resources, Windows support for PyTorch has lagged behind other platforms. Lack of test coverage resulted in unexpected issues popping up every now and then. Some of the core tutorials, meant for new users to learn and adopt PyTorch, would fail to run. The installation experience was also not as smooth, with the lack of official PyPI support for PyTorch on Windows. Lastly, some of the PyTorch functionality was simply not available on the Windows platform, such as the TorchAudio domain library and distributed training support. To help alleviate this pain, Microsoft is happy to bring its Windows expertise to the table and bring PyTorch on Windows to its best possible self. + +In the PyTorch 1.6 release, we have improved the core quality of the Windows build by bringing test coverage up to par with Linux for core PyTorch and its domain libraries and by automating tutorial testing. Thanks to the broader PyTorch community, which contributed TorchAudio support to Windows, we were able to add test coverage to all three domain libraries: TorchVision, TorchText and TorchAudio. In subsequent releases of PyTorch, we will continue improving the Windows experience based on community feedback and requests. So far, the feedback we received from the community points to distributed training support and a better installation experience using pip as the next areas of improvement. + +In addition to the native Windows experience, Microsoft released a preview adding [GPU compute support to Windows Subsystem for Linux (WSL) 2](https://blogs.windows.com/windowsdeveloper/2020/06/17/gpu-accelerated-ml-training-inside-the-windows-subsystem-for-linux/) distros, with a focus on enabling AI and ML developer workflows. WSL is designed for developers that want to run any Linux based tools directly on Windows. This preview enables valuable scenarios for a variety of frameworks and Python packages that utilize [NVIDIA CUDA](https://blogs.windows.com/windowsdeveloper/2020/06/17/gpu-accelerated-ml-training-inside-the-windows-subsystem-for-linux/) for acceleration and only support Linux. This means WSL customers using the preview can run native Linux based PyTorch applications on Windows unmodified without the need for a traditional virtual machine or a dual boot setup. + +## Getting started with PyTorch on Windows +It's easy to get started with PyTorch on Windows. To install PyTorch using Anaconda with the latest GPU support, run the command below. To install different supported configurations of PyTorch, refer to the installation instructions on [pytorch.org](https://pytorch.org/get-started/locally/). + + conda install pytorch torchvision cudatoolkit=10.2 -c pytorch + +Once you install PyTorch, learn more by visiting the [PyTorch Tutorials](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) and [documentation](https://pytorch.org/docs/stable/index.html). -## TUTORIALS HOME PAGE UPDATE -The tutorials home page now provides clear actions that developers can take. For new PyTorch users, there is an easy-to-discover button to take them directly to “A 60 Minute Blitz”. Right next to it, there is a button to view all recipes which are designed to teach specific features quickly with examples. -
- -
-In addition to the existing left navigation bar, tutorials can now be quickly filtered by multi-select tags. Let’s say you want to view all tutorials related to “Production” and “Quantization”. You can select the “Production” and “Quantization” filters as shown in the image shown below:
- +
-The following additional resources can also be found at the bottom of the Tutorials homepage: -* [PyTorch Cheat Sheet](https://pytorch.org/tutorials/beginner/ptcheat.html) -* [PyTorch Examples](https://github.com/pytorch/examples) -* [Tutorial on GitHub](https://github.com/pytorch/tutorials) - -## PYTORCH RECIPES -Recipes are new bite-sized, actionable examples designed to teach researchers and developers how to use specific PyTorch features. Some notable new recipes include: -* [Loading Data in PyTorch](https://pytorch.org/tutorials/recipes/recipes/loading_data_recipe.html) -* [Model Interpretability Using Captum](https://pytorch.org/tutorials/recipes/recipes/Captum_Recipe.html) -* [How to Use TensorBoard](https://pytorch.org/tutorials/recipes/recipes/tensorboard_with_pytorch.html) - -View the full recipes [here](http://pytorch.org/tutorials/recipes/recipes_index.html). - -## LEARNING PYTORCH -This section includes tutorials designed for users new to PyTorch. Based on community feedback, we have made updates to the current [Deep Learning with PyTorch: A 60 Minute Blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) tutorial, one of our most popular tutorials for beginners. Upon completion, one can understand what PyTorch and neural networks are, and be able to build and train a simple image classification network. Updates include adding explanations to clarify output meanings and linking back to where users can read more in the docs, cleaning up confusing syntax errors, and reconstructing and explaining new concepts for easier readability. - -## DEPLOYING MODELS IN PRODUCTION -This section includes tutorials for developers looking to take their PyTorch models to production. The tutorials include: -* [Deploying PyTorch in Python via a REST API with Flask](https://pytorch.org/tutorials/intermediate/flask_rest_api_tutorial.html) -* [Introduction to TorchScript](https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html) -* [Loading a TorchScript Model in C++](https://pytorch.org/tutorials/advanced/cpp_export.html) -* [Exploring a Model from PyTorch to ONNX and Running it using ONNX Runtime](https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html) - -## FRONTEND APIS -PyTorch provides a number of frontend API features that can help developers to code, debug, and validate their models more efficiently. This section includes tutorials that teach what these features are and how to use them. Some tutorials to highlight: -* [Introduction to Named Tensors in PyTorch](https://pytorch.org/tutorials/intermediate/named_tensor_tutorial.html) -* [Using the PyTorch C++ Frontend](https://pytorch.org/tutorials/advanced/cpp_frontend.html) -* [Extending TorchScript with Custom C++ Operators](https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html) -* [Extending TorchScript with Custom C++ Classes](https://pytorch.org/tutorials/advanced/torch_script_custom_classes.html) -* [Autograd in C++ Frontend](https://pytorch.org/tutorials/advanced/cpp_autograd.html) - -## MODEL OPTIMIZATION -Deep learning models often consume large amounts of memory, power, and compute due to their complexity. This section provides tutorials for model optimization: -* [Pruning](https://pytorch.org/tutorials/intermediate/pruning_tutorial.html) -* [Dynamic Quantization on BERT](https://pytorch.org/tutorials/intermediate/dynamic_quantization_bert_tutorial.html) -* [Static Quantization with Eager Mode in PyTorch](https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html) - -## PARALLEL AND DISTRIBUTED TRAINING -PyTorch provides features that can accelerate performance in research and production such as native support for asynchronous execution of collective operations and peer-to-peer communication that is accessible from Python and C++. This section includes tutorials on parallel and distributed training: -* [Single-Machine Model Parallel Best Practices](https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html) -* [Getting started with Distributed Data Parallel](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) -* [Getting started with Distributed RPC Framework](https://pytorch.org/tutorials/intermediate/rpc_tutorial.html) -* [Implementing a Parameter Server Using Distributed RPC Framework](https://pytorch.org/tutorials/intermediate/rpc_param_server_tutorial.html) - -Making these improvements are just the first step of improving PyTorch.org for the community. Please submit your suggestions [here](https://github.com/pytorch/tutorials/pulls). - -Cheers, - -Team PyTorch +## Getting started with PyTorch on Windows Subsystem for Linux +The [preview of NVIDIA CUDA support in WSL](https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-cuda-in-wsl) is now available to Windows Insiders running Build 20150 or higher. In WSL, the command to install PyTorch using Anaconda is the same as the above command for native Windows. If you prefer pip, use the command below. + + pip install torch torchvision + +You can use the same tutorials and documentation inside your WSL environment as on native Windows. This functionality is still in preview so if you run into issues with WSL please share feedback via the WSL [GitHub repo](https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-cuda-in-wsl) or with NVIDIA CUDA support share via NVIDIA’s [Community Forum for CUDA on WSL](https://forums.developer.nvidia.com/c/accelerated-computing/cuda/cuda-on-windows-subsystem-for-linux/303). + +## Feedback +If you find gaps in the PyTorch experience on Windows, please let us know on the [PyTorch discussion forum](https://discuss.pytorch.org/c/windows/26) or file an issue on [GitHub](https://discuss.pytorch.org/c/windows/26) using #module: windows label. + + + From f184249b401805bcdaa883b3f6b9ab69b7bc4494 Mon Sep 17 00:00:00 2001 From: andresruizfacebook <68402331+andresruizfacebook@users.noreply.github.com> Date: Fri, 24 Jul 2020 11:01:17 -0700 Subject: [PATCH 03/23] Update 2020-7-20-pytorch-1.6-released.md --- _posts/2020-7-20-pytorch-1.6-released.md | 236 +++++++++++++++++++++-- 1 file changed, 217 insertions(+), 19 deletions(-) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md index cb9e42fea7da..2e04025eacb9 100644 --- a/_posts/2020-7-20-pytorch-1.6-released.md +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -4,39 +4,237 @@ title: 'PyTorch 1.6 released w/ Native AMP Support, Microsoft joins as maintaine author: Team PyTorch --- -Maxim Lukiyanov - Principal PM at Microsoft, Emad Barsoum - Group EM at Microsoft, Guoliang Hua - Principal EM at Microsoft, Nikita Shulga - Tech Lead at Facebook, Geeta Chauhan - PE Lead at Facebook, Chris Gottbrath - Technical PM at Facebook, Jiachen Pu [(peterjc123)](https://github.com/peterjc123) - Engineer at Facebook +Today, we’re announcing the availability of PyTorch 1.6, along with updated domain libraries. We are also excited to announce the team at builds and binaries and will also be supporting the community on GitHub as well as the PyTorch Windows [discussion forums](https://discuss.pytorch.org/c/windows/). + +The PyTorch 1.6 release includes a number of new APIs, tools for performance improvement and profiling, as well as major updates to both distributed data parallel (DDP) and remote procedure call (RPC) based distributed training. +A few of the highlights include: + +1. Automatic mixed precision (AMP) training is now natively supported and a stable feature (See for more details) - thanks for NVIDIA’s contributions; +2. Native TensorPipe support now added for tensor-aware, point-to-point communication primitives built specifically for machine learning; +3. Added support for complex tensors to the frontend API surface; +4. New profiling tools providing tensor-level memory consumption information; and +5. Numerous improvements and new features for both distributed data parallel (DDP) training and the remote procedural call (RPC) packages. + +Additionally, from this release onward, features will be classified as Stable, Beta and Prototype. Prototype features are not included as part of the binary distribution and are instead available through either building from source, using nightlies or via compiler flag. You can learn more about what this change means in the post [here]. You can also find the full release notes [here]. + +# Performance & Profiling + +## [Stable] Automatic Mixed Precision (AMP) Training + +AMP allows users to easily enable automatic mixed precision training enabling higher performance and memory savings of up to 50% on Tensor Core GPUs. Using the natively supported `torch.cuda.amp` API, AMP provides convenience methods for mixed precision, where some operations use the `torch.float32 (float)` datatype and other operations use `torch.float16 (half)`. Some ops, like linear layers and convolutions, are much faster in `float16`. Other ops, like reductions, often require the dynamic range of `float32`. Mixed precision tries to match each op to its appropriate datatype. + +* Design doc | [Link](https://github.com/pytorch/pytorch/issues/25081) +* Documentation | [Link](https://pytorch.org/docs/stable/amp.html) +* Usage examples | [Link](https://pytorch.org/docs/stable/notes/amp_examples.html) + +## [Beta] Fork/Join Parallelism + +This release adds support for a language-level construct as well as runtime support for coarse-grained parallelism in TorchScript code. This support is useful for situations such as running models in an ensemble in parallel, or running bidirectional components of recurrent nets in parallel, and allows the ability to unlock the computational power of parallel architectures (e.g. many-core CPUs) for task level parallelism. + +Parallel execution of TorchScript programs is enabled through two primitives: `torch.jit.fork and torch.jit.wait`. In the below example, we parallelize execution of `foo:` + +```python + +import torch +from typing import List + +def foo(x): + return torch.neg(x) + +@torch.jit.script +def example(x): + futures = [torch.jit.fork(foo, x) for _ in range(100)] + results = [torch.jit.wait(future) for future in futures] + return torch.sum(torch.stack(results)) + +print(example(torch.ones([]))) + ``` + + ### * Documentation | [Link](https://pytorch.org/docs/stable/jit.html) + +## [Beta] Memory Profiler + +The `torch.autograd` API now includes a memory profiler that lets you inspect the cost of different operators inside your CPU and GPU models. There are two modes implemented at the moment - CPU-only using [profile](https://pytorch.org/docs/master/autograd.html#torch.autograd.profiler.profile). and nvprof based (registers both CPU and GPU activity) using [emit_nvtx](https://pytorch.org/docs/master/autograd.html#torch.autograd.profiler.emit_nvtx). + +Here is an example usage of the API: + +```python +x = torch.randn((1, 1), requires_grad=True) +with torch.autograd.profiler.profile(profile_memory=True) as prof: + for _ in range(100): # any normal python code, really! + y = x ** 2 + y.backward() + # NOTE: some columns were removed for brevity +print(prof.key_averages().table(sort_by="self_cpu_time_total")) + ``` + +```python +----------------------------------- --------------- --------------- --------------- +Name Self CPU total CPU time avg Number of Calls CPU Mem +----------------------------------- --------------- --------------- --------------- +mul 32.048ms 32.048ms 200 800 b +pow 27.041ms 27.041ms 200 800 b +PowBackward0 9.727ms 55.483ms 100 +torch::autograd::AccumulateGrad 9.148ms 9.148ms 100 +torch::autograd::GraphRoot 691.816us 691.816us 100 +----------------------------------- --------------- --------------- --------------- + ``` + +* Design doc | [Link](https://github.com/pytorch/pytorch/pull/37775) +* Documentation | [Link](https://pytorch.org/docs/stable/autograd.html#profiler) + + + +# Distributed Training & RPC + +## [Beta] TensorPipe backend for RPC + +PyTorch 1.6 introduces a new backend for the RPC module which leverages the TensorPipe library, a tensor-aware point-to-point communication primitive targeted at machine learning, intended to complement the current primitives for distributed training in PyTorch (Gloo, MPI, ...) which are collective and blocking. The pairwise and asynchronous nature of TensorPipe lends itself to new networking paradigms that go beyond data parallel: client-server approaches (e.g., parameter server for embeddings, actor-learner separation in Impala-style RL, ...) and model and pipeline parallel training (think GPipe), gossip SGD, etc. + +```python +# One-line change needed to opt in +torch.distributed.rpc.init_rpc( + ... + backend=torch.distributed.rpc.BackendType.TENSORPIPE, +) + +# No changes to the rest of the RPC API +torch.distributed.rpc.rpc_sync(...) +``` + +* Design doc | [Link](https://github.com/pytorch/pytorch/issues/35251) +* Documentation | [Link](https://pytorch.org/docs/stable/rpc/index.html) + +## [Beta] DDP+RPC + +PyTorch Distributed supports two powerful paradigms: DDP for full sync data parallel training of models and the RPC framework which allows for distributed model parallelism. Currently, these two features work independently and users can’t mix and match these to try out hybrid parallelism paradigms. + +Starting in PyTorch 1.6, we’ve enabled DDP and RPC to work together seamlessly so that users can combine these two techniques to achieve both data parallelism and model parallelism. An example is where users would like to place large embedding tables on parameter servers and use the RPC framework for embedding lookups, but store smaller dense parameters on trainers and use DDP to synchronize the dense parameters. Below is a simple code snippet. + +```python +// On each trainer + +remote_emb = create_emb(on="ps", ...) +ddp_model = DDP(dense_model) + +for data in batch: + with torch.distributed.autograd.context(): + res = remote_emb(data) + loss = ddp_model(res) + torch.distributed.autograd.backward([loss]) +``` + +* DDP+RPC Tutorial | [Link](https://pytorch.org/tutorials/advanced/rpc_ddp_tutorial.html) +* Documentation | [Link](https://pytorch.org/docs/stable/rpc/index.html) +* Usage Examples | [Link](https://github.com/pytorch/examples/pull/800) + +## [Beta] RPC - Asynchronous User Functions + +RPC Asynchronous User Functions supports the ability to yield and resume on the server side when executing a user-defined function. Prior to this feature, when an callee processes a request, one RPC thread waits until the user function returns. If the user function contains IO (e.g., nested RPC) or signaling (e.g., waiting for another request to unblock), the corresponding RPC thread would sit idle waiting for these events. As a result, some applications have to use a very large number of threads and send additional RPC requests, which can potentially lead to performance degradation. To make a user function yield on such events, applications need to: 1) Decorate the function with the `@rpc.functions.async_execution` decorator; and 2) Let the function return a `torch.futures.Future` and install the resume logic as callbacks on the `Future` object. See below for an example: + + +```python +@rpc.functions.async_execution +def async_add_chained(to, x, y, z): + return rpc.rpc_async(to, torch.add, args=(x, y)).then( + lambda fut: fut.wait() + z + ) + +ret = rpc.rpc_sync( + "worker1", + async_add_chained, + args=("worker2", torch.ones(2), 1, 1) +) + +print(ret) # prints tensor([3., 3.]) + +``` + +* Tutorial for performant batch RPC using Asynchronous User Functions| Link (https://github.com/pytorch/tutorials/blob/release/1.6/intermediate_source/rpc_async_execution.rst) +* Documentation | Link (https://pytorch.org/docs/stable/rpc.html#torch.distributed.rpc.functions.async_execution) +* Usage examples | Link (https://github.com/pytorch/examples/tree/stable/distributed/rpc/batch) + +# Frontend API Updates + +## [Beta] Complex Numbers + +The PyTorch 1.6 release brings beta level support for complex tensors including torch.complex64 and torch.complex128 dtypes. A complex* *number is a number that can be expressed in the form a + bj, where a and b are real numbers, and j is a solution of the equation x^2 = −1. Complex numbers frequently occur in mathematics and engineering, especially in signal processing and the area of complex neural networks is an active area of research. The beta release of complex tensors will support common PyTorch and complex tensor functionality, plus functions needed by Torchaudio, ESPnet and others. While this is an early version of this feature, and we expect it to improve over time, the overall goal is provide a NumPy compatible user experience that leverages PyTorch’s ability to run on accelerators and work with autograd to better support the scientific community. + +# Updated Domain Libraries + +## torchvision 0.7 + +torchvision 0.7 introduces two new pretrained semantic segmentation models, FCN ResNet50 (https://arxiv.org/abs/1411.4038) and DeepLabV3 ResNet50 (https://arxiv.org/abs/1706.05587), both trained on COCO and using smaller memory footprints than the ResNet101 backbone. We also introduced support for AMP (Automatic Mixed Precision) autocasting for torchvision models and operators, which automatically selects the floating point precision for different GPU operations to improve performance while maintaining accuracy. + +* Release notes | [Link](https://github.com/pytorch/vision/releases) + +## torchaudio 0.6 + +torchaudio now officially supports Windows. This release also introduces a new model module (with wav2letter included), new functionals (contrast, cvm, dcshift, overdrive, vad, phaser, flanger, biquad), datasets (GTZAN, CMU), and a new optional sox backend with support for torchscript. + +* Release notes | [Link](https://github.com/pytorch/audio/releases) + +# Additional updates + +## HACKATHON + +The Global PyTorch Summer Hackathon is back! This year, teams can compete in three categories virtually: + + 1. **PyTorch Developer Tools:** Tools or libraries designed to improve productivity and efficiency of PyTorch for researchers and developers + 2. **Web/Mobile Applications powered by PyTorch:** Applications with web/mobile interfaces and/or embedded devices powered by PyTorch + 3. **PyTorch Responsible AI Development Tools:** Tools, libraries, or web/mobile apps for responsible AI development + +This is a great opportunity to connect with the community and practice your machine learning skills. + +* [Join the hackathon](http://pytorch2020.devpost.com/) +* [Watch educational videos](https://www.youtube.com/pytorch) + + +## LPCV Challenge + +The [2020 CVPR Low-Power Vision Challenge (LPCV) - Online Track for UAV video](https://lpcv.ai/2020CVPR/video-track) submission deadline coming up shortly. You have until July 31, 2020 to build a system that can discover and recognize characters in video captured by an unmanned aerial vehicle (UAV) accurately using PyTorch and Raspberry Pi 3B+. + +## Prototype Features + +To reiterate, prototype features in PyTorch are early features that we are looking to gather feedback on, gauge the usefulness of and improve ahead of graduating them to beta or stable. The following features are not part of the PyTorch 1.6 release and instead are available in nightlies with separate docs/tutorials to help facilitate early usage and feedback. + +**Distributed RPC/Profiler** - Allow users to profile training jobs that use `torch.distributed.rpc` using the autograd profiler, and remotely invoke the profiler in order to collect profiling information across different nodes. The RFC can be found [here](https://github.com/pytorch/pytorch/issues/39675) and a short recipe on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). + +**TorchScript Module Freezing:** Module Freezing is the process of inlining module parameters and attributes values into the TorchScript internal representation. Parameter and attribute values are treated as final value and they cannot be modified in the frozen module. The PR for this feature can be found [here](https://github.com/pytorch/pytorch/pull/32178) and a short tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). + +**Graph Mode Quantization:** Eager mode quantization requires users to make change to their model, including explicitly quantizing activations, module fusion, rewriting use of torch ops with Functional Modules and quantization of functionals are not supported. If we can trace or script the model, then the quantization can be done automatically with graph mode quantization without any of the complexities in eager mode, and it is configurable through a `qconfig_dict`. A tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). + +**Quantization Numerical Suite:** Quantization is good when it works, but it’s difficult to know what's wrong when it doesn't satisfy the expected accuracy. A prototype is now available for a Numerical Suite that measures comparison statistics between quantized modules and float modules. This is available to test using eager mode and on CPU only with more support coming. A tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). + +Cheers! +Team PyTorch + + + + + + + + + + + + + -Along with the PyTorch 1.6 release, we are excited to announce that Microsoft has expanded its participation in the PyTorch community and is taking ownership of the development and maintenance of the PyTorch build for Windows. -According to the latest [Stack Overflow developer survey](https://insights.stackoverflow.com/survey/2020#technology-developers-primary-operating-systems), Windows remains the primary operating system for the developer community (46% Windows vs 28% MacOS). Jiachen Pu (peterjc123) initially made a heroic effort to add support for PyTorch on Windows, but due to limited resources, Windows support for PyTorch has lagged behind other platforms. Lack of test coverage resulted in unexpected issues popping up every now and then. Some of the core tutorials, meant for new users to learn and adopt PyTorch, would fail to run. The installation experience was also not as smooth, with the lack of official PyPI support for PyTorch on Windows. Lastly, some of the PyTorch functionality was simply not available on the Windows platform, such as the TorchAudio domain library and distributed training support. To help alleviate this pain, Microsoft is happy to bring its Windows expertise to the table and bring PyTorch on Windows to its best possible self. -In the PyTorch 1.6 release, we have improved the core quality of the Windows build by bringing test coverage up to par with Linux for core PyTorch and its domain libraries and by automating tutorial testing. Thanks to the broader PyTorch community, which contributed TorchAudio support to Windows, we were able to add test coverage to all three domain libraries: TorchVision, TorchText and TorchAudio. In subsequent releases of PyTorch, we will continue improving the Windows experience based on community feedback and requests. So far, the feedback we received from the community points to distributed training support and a better installation experience using pip as the next areas of improvement. -In addition to the native Windows experience, Microsoft released a preview adding [GPU compute support to Windows Subsystem for Linux (WSL) 2](https://blogs.windows.com/windowsdeveloper/2020/06/17/gpu-accelerated-ml-training-inside-the-windows-subsystem-for-linux/) distros, with a focus on enabling AI and ML developer workflows. WSL is designed for developers that want to run any Linux based tools directly on Windows. This preview enables valuable scenarios for a variety of frameworks and Python packages that utilize [NVIDIA CUDA](https://blogs.windows.com/windowsdeveloper/2020/06/17/gpu-accelerated-ml-training-inside-the-windows-subsystem-for-linux/) for acceleration and only support Linux. This means WSL customers using the preview can run native Linux based PyTorch applications on Windows unmodified without the need for a traditional virtual machine or a dual boot setup. -## Getting started with PyTorch on Windows -It's easy to get started with PyTorch on Windows. To install PyTorch using Anaconda with the latest GPU support, run the command below. To install different supported configurations of PyTorch, refer to the installation instructions on [pytorch.org](https://pytorch.org/get-started/locally/). - conda install pytorch torchvision cudatoolkit=10.2 -c pytorch -Once you install PyTorch, learn more by visiting the [PyTorch Tutorials](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) and [documentation](https://pytorch.org/docs/stable/index.html). -
- -
-## Getting started with PyTorch on Windows Subsystem for Linux -The [preview of NVIDIA CUDA support in WSL](https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-cuda-in-wsl) is now available to Windows Insiders running Build 20150 or higher. In WSL, the command to install PyTorch using Anaconda is the same as the above command for native Windows. If you prefer pip, use the command below. - - pip install torch torchvision -You can use the same tutorials and documentation inside your WSL environment as on native Windows. This functionality is still in preview so if you run into issues with WSL please share feedback via the WSL [GitHub repo](https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-cuda-in-wsl) or with NVIDIA CUDA support share via NVIDIA’s [Community Forum for CUDA on WSL](https://forums.developer.nvidia.com/c/accelerated-computing/cuda/cuda-on-windows-subsystem-for-linux/303). -## Feedback -If you find gaps in the PyTorch experience on Windows, please let us know on the [PyTorch discussion forum](https://discuss.pytorch.org/c/windows/26) or file an issue on [GitHub](https://discuss.pytorch.org/c/windows/26) using #module: windows label. From da7a6cd3f42904275783d67837881c41c7400312 Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Fri, 24 Jul 2020 15:17:08 -0700 Subject: [PATCH 04/23] Update 2020-7-20-pytorch-1.6-released.md --- _posts/2020-7-20-pytorch-1.6-released.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md index 2e04025eacb9..c1fca46246c3 100644 --- a/_posts/2020-7-20-pytorch-1.6-released.md +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -206,6 +206,7 @@ To reiterate, prototype features in PyTorch are early features that we are looki **Quantization Numerical Suite:** Quantization is good when it works, but it’s difficult to know what's wrong when it doesn't satisfy the expected accuracy. A prototype is now available for a Numerical Suite that measures comparison statistics between quantized modules and float modules. This is available to test using eager mode and on CPU only with more support coming. A tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). + Cheers! Team PyTorch From c6cded4800d6269bc98c4a54e401348c215d40d0 Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Fri, 24 Jul 2020 15:57:11 -0700 Subject: [PATCH 05/23] Update 2020-7-20-pytorch-1.6-released.md --- _posts/2020-7-20-pytorch-1.6-released.md | 79 +++++++----------------- 1 file changed, 23 insertions(+), 56 deletions(-) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md index c1fca46246c3..d95cfeea7aac 100644 --- a/_posts/2020-7-20-pytorch-1.6-released.md +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -4,12 +4,12 @@ title: 'PyTorch 1.6 released w/ Native AMP Support, Microsoft joins as maintaine author: Team PyTorch --- -Today, we’re announcing the availability of PyTorch 1.6, along with updated domain libraries. We are also excited to announce the team at builds and binaries and will also be supporting the community on GitHub as well as the PyTorch Windows [discussion forums](https://discuss.pytorch.org/c/windows/). +Today, we’re announcing the availability of PyTorch 1.6, along with updated domain libraries. We are also excited to announce the team at builds and binaries and will also be supporting the community on GitHub as well as the PyTorch Windows discussion forums. The PyTorch 1.6 release includes a number of new APIs, tools for performance improvement and profiling, as well as major updates to both distributed data parallel (DDP) and remote procedure call (RPC) based distributed training. A few of the highlights include: -1. Automatic mixed precision (AMP) training is now natively supported and a stable feature (See for more details) - thanks for NVIDIA’s contributions; +1. Automatic mixed precision (AMP) training is now natively supported and a stable feature (See [here] for more details) - thanks for NVIDIA’s contributions; 2. Native TensorPipe support now added for tensor-aware, point-to-point communication primitives built specifically for machine learning; 3. Added support for complex tensors to the frontend API surface; 4. New profiling tools providing tensor-level memory consumption information; and @@ -23,9 +23,9 @@ Additionally, from this release onward, features will be classified as Stable, B AMP allows users to easily enable automatic mixed precision training enabling higher performance and memory savings of up to 50% on Tensor Core GPUs. Using the natively supported `torch.cuda.amp` API, AMP provides convenience methods for mixed precision, where some operations use the `torch.float32 (float)` datatype and other operations use `torch.float16 (half)`. Some ops, like linear layers and convolutions, are much faster in `float16`. Other ops, like reductions, often require the dynamic range of `float32`. Mixed precision tries to match each op to its appropriate datatype. -* Design doc | [Link](https://github.com/pytorch/pytorch/issues/25081) -* Documentation | [Link](https://pytorch.org/docs/stable/amp.html) -* Usage examples | [Link](https://pytorch.org/docs/stable/notes/amp_examples.html) +* Design doc ([Link](https://github.com/pytorch/pytorch/issues/25081)) +* Documentation ([Link](https://pytorch.org/docs/stable/amp.html)) +* Usage examples ([Link](https://pytorch.org/docs/stable/notes/amp_examples.html)) ## [Beta] Fork/Join Parallelism @@ -34,7 +34,6 @@ This release adds support for a language-level construct as well as runtime supp Parallel execution of TorchScript programs is enabled through two primitives: `torch.jit.fork and torch.jit.wait`. In the below example, we parallelize execution of `foo:` ```python - import torch from typing import List @@ -50,7 +49,7 @@ def example(x): print(example(torch.ones([]))) ``` - ### * Documentation | [Link](https://pytorch.org/docs/stable/jit.html) +* Documentation ([Link](https://pytorch.org/docs/stable/jit.html)) ## [Beta] Memory Profiler @@ -80,10 +79,8 @@ torch::autograd::GraphRoot 691.816us 691.816us 100 ----------------------------------- --------------- --------------- --------------- ``` -* Design doc | [Link](https://github.com/pytorch/pytorch/pull/37775) -* Documentation | [Link](https://pytorch.org/docs/stable/autograd.html#profiler) - - +* Design doc ([Link](https://github.com/pytorch/pytorch/pull/37775)) +* Documentation ([Link](https://pytorch.org/docs/stable/autograd.html#profiler)) # Distributed Training & RPC @@ -102,12 +99,12 @@ torch.distributed.rpc.init_rpc( torch.distributed.rpc.rpc_sync(...) ``` -* Design doc | [Link](https://github.com/pytorch/pytorch/issues/35251) -* Documentation | [Link](https://pytorch.org/docs/stable/rpc/index.html) +* Design doc ([Link](https://github.com/pytorch/pytorch/issues/35251)) +* Documentation ([Link](https://pytorch.org/docs/stable/rpc/index.html)) -## [Beta] DDP+RPC +## [Beta] DDP+RPC -PyTorch Distributed supports two powerful paradigms: DDP for full sync data parallel training of models and the RPC framework which allows for distributed model parallelism. Currently, these two features work independently and users can’t mix and match these to try out hybrid parallelism paradigms. +PyTorch Distributed supports two powerful paradigms: DDP for full sync data parallel training of models and the RPC framework which allows for distributed model parallelism. Previously, these two features work independently and users can’t mix and match these to try out hybrid parallelism paradigms. Starting in PyTorch 1.6, we’ve enabled DDP and RPC to work together seamlessly so that users can combine these two techniques to achieve both data parallelism and model parallelism. An example is where users would like to place large embedding tables on parameter servers and use the RPC framework for embedding lookups, but store smaller dense parameters on trainers and use DDP to synchronize the dense parameters. Below is a simple code snippet. @@ -124,9 +121,9 @@ for data in batch: torch.distributed.autograd.backward([loss]) ``` -* DDP+RPC Tutorial | [Link](https://pytorch.org/tutorials/advanced/rpc_ddp_tutorial.html) -* Documentation | [Link](https://pytorch.org/docs/stable/rpc/index.html) -* Usage Examples | [Link](https://github.com/pytorch/examples/pull/800) +* DDP+RPC Tutorial ([Link](https://pytorch.org/tutorials/advanced/rpc_ddp_tutorial.html)) +* Documentation ([Link](https://pytorch.org/docs/stable/rpc/index.html)) +* Usage Examples ([Link](https://github.com/pytorch/examples/pull/800)) ## [Beta] RPC - Asynchronous User Functions @@ -147,12 +144,11 @@ ret = rpc.rpc_sync( ) print(ret) # prints tensor([3., 3.]) - ``` -* Tutorial for performant batch RPC using Asynchronous User Functions| Link (https://github.com/pytorch/tutorials/blob/release/1.6/intermediate_source/rpc_async_execution.rst) -* Documentation | Link (https://pytorch.org/docs/stable/rpc.html#torch.distributed.rpc.functions.async_execution) -* Usage examples | Link (https://github.com/pytorch/examples/tree/stable/distributed/rpc/batch) +* Tutorial for performant batch RPC using Asynchronous User Functions ([Link](https://github.com/pytorch/tutorials/blob/release/1.6/intermediate_source/rpc_async_execution.rst)) +* Documentation ([Link](https://pytorch.org/docs/stable/rpc.html#torch.distributed.rpc.functions.async_execution)) +* Usage examples ([Link](https://github.com/pytorch/examples/tree/stable/distributed/rpc/batch)) # Frontend API Updates @@ -164,15 +160,15 @@ The PyTorch 1.6 release brings beta level support for complex tensors including ## torchvision 0.7 -torchvision 0.7 introduces two new pretrained semantic segmentation models, FCN ResNet50 (https://arxiv.org/abs/1411.4038) and DeepLabV3 ResNet50 (https://arxiv.org/abs/1706.05587), both trained on COCO and using smaller memory footprints than the ResNet101 backbone. We also introduced support for AMP (Automatic Mixed Precision) autocasting for torchvision models and operators, which automatically selects the floating point precision for different GPU operations to improve performance while maintaining accuracy. +torchvision 0.7 introduces two new pretrained semantic segmentation models, [FCN ResNet50](https://arxiv.org/abs/1411.4038) and [DeepLabV3 ResNet50](https://arxiv.org/abs/1706.05587), both trained on COCO and using smaller memory footprints than the ResNet101 backbone. We also introduced support for AMP (Automatic Mixed Precision) autocasting for torchvision models and operators, which automatically selects the floating point precision for different GPU operations to improve performance while maintaining accuracy. -* Release notes | [Link](https://github.com/pytorch/vision/releases) +* Release notes ([Link](https://github.com/pytorch/vision/releases)) ## torchaudio 0.6 torchaudio now officially supports Windows. This release also introduces a new model module (with wav2letter included), new functionals (contrast, cvm, dcshift, overdrive, vad, phaser, flanger, biquad), datasets (GTZAN, CMU), and a new optional sox backend with support for torchscript. -* Release notes | [Link](https://github.com/pytorch/audio/releases) +* Release notes ([Link](https://github.com/pytorch/audio/releases)) # Additional updates @@ -192,7 +188,7 @@ This is a great opportunity to connect with the community and practice your mach ## LPCV Challenge -The [2020 CVPR Low-Power Vision Challenge (LPCV) - Online Track for UAV video](https://lpcv.ai/2020CVPR/video-track) submission deadline coming up shortly. You have until July 31, 2020 to build a system that can discover and recognize characters in video captured by an unmanned aerial vehicle (UAV) accurately using PyTorch and Raspberry Pi 3B+. +The [2020 CVPR Low-Power Vision Challenge (LPCV) - Online Track for UAV video](https://lpcv.ai/2020CVPR/video-track) submission deadline is coming up shortly. You have until July 31, 2020 to build a system that can discover and recognize characters in video captured by an unmanned aerial vehicle (UAV) accurately using PyTorch and Raspberry Pi 3B+. ## Prototype Features @@ -208,34 +204,5 @@ To reiterate, prototype features in PyTorch are early features that we are looki Cheers! -Team PyTorch - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +Team PyTorch From d9ca67f1726177ec7a4648021b4daf1c2da7bd40 Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Mon, 27 Jul 2020 08:50:11 -0700 Subject: [PATCH 06/23] Update 2020-7-20-pytorch-1.6-released.md --- _posts/2020-7-20-pytorch-1.6-released.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md index d95cfeea7aac..d55cb54688d7 100644 --- a/_posts/2020-7-20-pytorch-1.6-released.md +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -4,7 +4,7 @@ title: 'PyTorch 1.6 released w/ Native AMP Support, Microsoft joins as maintaine author: Team PyTorch --- -Today, we’re announcing the availability of PyTorch 1.6, along with updated domain libraries. We are also excited to announce the team at builds and binaries and will also be supporting the community on GitHub as well as the PyTorch Windows discussion forums. +Today, we’re announcing the availability of PyTorch 1.6, along with updated domain libraries. We are also excited to announce the team at Microsoft is now maintaining Windows builds and binaries and will also be supporting the community on GitHub as well as the PyTorch Windows discussion forums. The PyTorch 1.6 release includes a number of new APIs, tools for performance improvement and profiling, as well as major updates to both distributed data parallel (DDP) and remote procedure call (RPC) based distributed training. A few of the highlights include: From 0d3ddda471c297760eeaf4f3f4015e2fcf490431 Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Mon, 27 Jul 2020 09:26:01 -0700 Subject: [PATCH 07/23] Update 2020-7-20-pytorch-1.6-released.md --- _posts/2020-7-20-pytorch-1.6-released.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md index d55cb54688d7..8549b3cdbe85 100644 --- a/_posts/2020-7-20-pytorch-1.6-released.md +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -9,13 +9,13 @@ Today, we’re announcing the availability of PyTorch 1.6, along with updated do The PyTorch 1.6 release includes a number of new APIs, tools for performance improvement and profiling, as well as major updates to both distributed data parallel (DDP) and remote procedure call (RPC) based distributed training. A few of the highlights include: -1. Automatic mixed precision (AMP) training is now natively supported and a stable feature (See [here] for more details) - thanks for NVIDIA’s contributions; +1. Automatic mixed precision (AMP) training is now natively supported and a stable feature (See [here](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) for more details) - thanks for NVIDIA’s contributions; 2. Native TensorPipe support now added for tensor-aware, point-to-point communication primitives built specifically for machine learning; 3. Added support for complex tensors to the frontend API surface; -4. New profiling tools providing tensor-level memory consumption information; and +4. New profiling tools providing tensor-level memory consumption information; 5. Numerous improvements and new features for both distributed data parallel (DDP) training and the remote procedural call (RPC) packages. -Additionally, from this release onward, features will be classified as Stable, Beta and Prototype. Prototype features are not included as part of the binary distribution and are instead available through either building from source, using nightlies or via compiler flag. You can learn more about what this change means in the post [here]. You can also find the full release notes [here]. +Additionally, from this release onward, features will be classified as Stable, Beta and Prototype. Prototype features are not included as part of the binary distribution and are instead available through either building from source, using nightlies or via compiler flag. You can learn more about what this change means in the post [here](https://pytorch.org/blog/microsoft-becomes-maintainer-of-the-windows-version-of-pytorch). You can also find the full release notes [here]. # Performance & Profiling @@ -31,7 +31,7 @@ AMP allows users to easily enable automatic mixed precision training enabling hi This release adds support for a language-level construct as well as runtime support for coarse-grained parallelism in TorchScript code. This support is useful for situations such as running models in an ensemble in parallel, or running bidirectional components of recurrent nets in parallel, and allows the ability to unlock the computational power of parallel architectures (e.g. many-core CPUs) for task level parallelism. -Parallel execution of TorchScript programs is enabled through two primitives: `torch.jit.fork and torch.jit.wait`. In the below example, we parallelize execution of `foo:` +Parallel execution of TorchScript programs is enabled through two primitives: `torch.jit.fork` and `torch.jit.wait`. In the below example, we parallelize execution of `foo`: ```python import torch @@ -53,7 +53,7 @@ print(example(torch.ones([]))) ## [Beta] Memory Profiler -The `torch.autograd` API now includes a memory profiler that lets you inspect the cost of different operators inside your CPU and GPU models. There are two modes implemented at the moment - CPU-only using [profile](https://pytorch.org/docs/master/autograd.html#torch.autograd.profiler.profile). and nvprof based (registers both CPU and GPU activity) using [emit_nvtx](https://pytorch.org/docs/master/autograd.html#torch.autograd.profiler.emit_nvtx). +The `torch.autograd` API now includes a memory profiler that lets you inspect the cost of different operators inside your CPU and GPU models. There are two modes implemented at the moment - CPU-only using '[profile](https://pytorch.org/docs/master/autograd.html#torch.autograd.profiler.profile)'. and nvprof based (registers both CPU and GPU activity) using '[emit_nvtx](https://pytorch.org/docs/master/autograd.html#torch.autograd.profiler.emit_nvtx)'. Here is an example usage of the API: @@ -104,7 +104,7 @@ torch.distributed.rpc.rpc_sync(...) ## [Beta] DDP+RPC -PyTorch Distributed supports two powerful paradigms: DDP for full sync data parallel training of models and the RPC framework which allows for distributed model parallelism. Previously, these two features work independently and users can’t mix and match these to try out hybrid parallelism paradigms. +PyTorch Distributed supports two powerful paradigms: DDP for full sync data parallel training of models and the RPC framework which allows for distributed model parallelism. Previously, these two features worked independently and users couldn’t mix and match these to try out hybrid parallelism paradigms. Starting in PyTorch 1.6, we’ve enabled DDP and RPC to work together seamlessly so that users can combine these two techniques to achieve both data parallelism and model parallelism. An example is where users would like to place large embedding tables on parameter servers and use the RPC framework for embedding lookups, but store smaller dense parameters on trainers and use DDP to synchronize the dense parameters. Below is a simple code snippet. @@ -127,7 +127,7 @@ for data in batch: ## [Beta] RPC - Asynchronous User Functions -RPC Asynchronous User Functions supports the ability to yield and resume on the server side when executing a user-defined function. Prior to this feature, when an callee processes a request, one RPC thread waits until the user function returns. If the user function contains IO (e.g., nested RPC) or signaling (e.g., waiting for another request to unblock), the corresponding RPC thread would sit idle waiting for these events. As a result, some applications have to use a very large number of threads and send additional RPC requests, which can potentially lead to performance degradation. To make a user function yield on such events, applications need to: 1) Decorate the function with the `@rpc.functions.async_execution` decorator; and 2) Let the function return a `torch.futures.Future` and install the resume logic as callbacks on the `Future` object. See below for an example: +RPC Asynchronous User Functions supports the ability to yield and resume on the server side when executing a user-defined function. Prior to this feature, when a callee processes a request, one RPC thread waits until the user function returns. If the user function contains IO (e.g., nested RPC) or signaling (e.g., waiting for another request to unblock), the corresponding RPC thread would sit idle waiting for these events. As a result, some applications have to use a very large number of threads and send additional RPC requests, which can potentially lead to performance degradation. To make a user function yield on such events, applications need to: 1) Decorate the function with the `@rpc.functions.async_execution` decorator; and 2) Let the function return a `torch.futures.Future` and install the resume logic as callbacks on the `Future` object. See below for an example: ```python @@ -154,7 +154,7 @@ print(ret) # prints tensor([3., 3.]) ## [Beta] Complex Numbers -The PyTorch 1.6 release brings beta level support for complex tensors including torch.complex64 and torch.complex128 dtypes. A complex* *number is a number that can be expressed in the form a + bj, where a and b are real numbers, and j is a solution of the equation x^2 = −1. Complex numbers frequently occur in mathematics and engineering, especially in signal processing and the area of complex neural networks is an active area of research. The beta release of complex tensors will support common PyTorch and complex tensor functionality, plus functions needed by Torchaudio, ESPnet and others. While this is an early version of this feature, and we expect it to improve over time, the overall goal is provide a NumPy compatible user experience that leverages PyTorch’s ability to run on accelerators and work with autograd to better support the scientific community. +The PyTorch 1.6 release brings beta level support for complex tensors including torch.complex64 and torch.complex128 dtypes. A complex number is a number that can be expressed in the form a + bj, where a and b are real numbers, and j is a solution of the equation x^2 = −1. Complex numbers frequently occur in mathematics and engineering, especially in signal processing and the area of complex neural networks is an active area of research. The beta release of complex tensors will support common PyTorch and complex tensor functionality, plus functions needed by Torchaudio, ESPnet and others. While this is an early version of this feature, and we expect it to improve over time, the overall goal is provide a NumPy compatible user experience that leverages PyTorch’s ability to run on accelerators and work with autograd to better support the scientific community. # Updated Domain Libraries @@ -166,7 +166,7 @@ torchvision 0.7 introduces two new pretrained semantic segmentation models, [FCN ## torchaudio 0.6 -torchaudio now officially supports Windows. This release also introduces a new model module (with wav2letter included), new functionals (contrast, cvm, dcshift, overdrive, vad, phaser, flanger, biquad), datasets (GTZAN, CMU), and a new optional sox backend with support for torchscript. +torchaudio now officially supports Windows. This release also introduces a new model module (with wav2letter included), new functionals (contrast, cvm, dcshift, overdrive, vad, phaser, flanger, biquad), datasets (GTZAN, CMU), and a new optional sox backend with support for TorchScript. * Release notes ([Link](https://github.com/pytorch/audio/releases)) @@ -192,13 +192,13 @@ The [2020 CVPR Low-Power Vision Challenge (LPCV) - Online Track for UAV video](h ## Prototype Features -To reiterate, prototype features in PyTorch are early features that we are looking to gather feedback on, gauge the usefulness of and improve ahead of graduating them to beta or stable. The following features are not part of the PyTorch 1.6 release and instead are available in nightlies with separate docs/tutorials to help facilitate early usage and feedback. +To reiterate, Prototype features in PyTorch are early features that we are looking to gather feedback on, gauge the usefulness of and improve ahead of graduating them to Beta or Stable. The following features are not part of the PyTorch 1.6 release and instead are available in nightlies with separate docs/tutorials to help facilitate early usage and feedback. **Distributed RPC/Profiler** - Allow users to profile training jobs that use `torch.distributed.rpc` using the autograd profiler, and remotely invoke the profiler in order to collect profiling information across different nodes. The RFC can be found [here](https://github.com/pytorch/pytorch/issues/39675) and a short recipe on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). **TorchScript Module Freezing:** Module Freezing is the process of inlining module parameters and attributes values into the TorchScript internal representation. Parameter and attribute values are treated as final value and they cannot be modified in the frozen module. The PR for this feature can be found [here](https://github.com/pytorch/pytorch/pull/32178) and a short tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). -**Graph Mode Quantization:** Eager mode quantization requires users to make change to their model, including explicitly quantizing activations, module fusion, rewriting use of torch ops with Functional Modules and quantization of functionals are not supported. If we can trace or script the model, then the quantization can be done automatically with graph mode quantization without any of the complexities in eager mode, and it is configurable through a `qconfig_dict`. A tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). +**Graph Mode Quantization:** Eager mode quantization requires users to make changes to their model, including explicitly quantizing activations, module fusion, rewriting use of torch ops with Functional Modules and quantization of functionals are not supported. If we can trace or script the model, then the quantization can be done automatically with graph mode quantization without any of the complexities in eager mode, and it is configurable through a `qconfig_dict`. A tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). **Quantization Numerical Suite:** Quantization is good when it works, but it’s difficult to know what's wrong when it doesn't satisfy the expected accuracy. A prototype is now available for a Numerical Suite that measures comparison statistics between quantized modules and float modules. This is available to test using eager mode and on CPU only with more support coming. A tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). From 49b7db2d5346641f4e3ff5241feb8f6cb53fbbd9 Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Mon, 27 Jul 2020 09:29:38 -0700 Subject: [PATCH 08/23] Update 2020-7-20-pytorch-1.6-released.md --- _posts/2020-7-20-pytorch-1.6-released.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md index 8549b3cdbe85..2929c8144d4a 100644 --- a/_posts/2020-7-20-pytorch-1.6-released.md +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -4,7 +4,7 @@ title: 'PyTorch 1.6 released w/ Native AMP Support, Microsoft joins as maintaine author: Team PyTorch --- -Today, we’re announcing the availability of PyTorch 1.6, along with updated domain libraries. We are also excited to announce the team at Microsoft is now maintaining Windows builds and binaries and will also be supporting the community on GitHub as well as the PyTorch Windows discussion forums. +Today, we’re announcing the availability of PyTorch 1.6, along with updated domain libraries. We are also excited to announce the team at [Microsoft is now maintaining Windows builds and binaries](https://pytorch.org/blog/microsoft-becomes-maintainer-of-the-windows-version-of-pytorch) and will also be supporting the community on GitHub as well as the PyTorch Windows discussion forums. The PyTorch 1.6 release includes a number of new APIs, tools for performance improvement and profiling, as well as major updates to both distributed data parallel (DDP) and remote procedure call (RPC) based distributed training. A few of the highlights include: From fb98ec1bf6926a806b7454ff723773b8256fc010 Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Mon, 27 Jul 2020 10:31:43 -0700 Subject: [PATCH 09/23] Update 2020-7-20-pytorch-1.6-released.md --- _posts/2020-7-20-pytorch-1.6-released.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md index 2929c8144d4a..17783ec866dc 100644 --- a/_posts/2020-7-20-pytorch-1.6-released.md +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -15,7 +15,7 @@ A few of the highlights include: 4. New profiling tools providing tensor-level memory consumption information; 5. Numerous improvements and new features for both distributed data parallel (DDP) training and the remote procedural call (RPC) packages. -Additionally, from this release onward, features will be classified as Stable, Beta and Prototype. Prototype features are not included as part of the binary distribution and are instead available through either building from source, using nightlies or via compiler flag. You can learn more about what this change means in the post [here](https://pytorch.org/blog/microsoft-becomes-maintainer-of-the-windows-version-of-pytorch). You can also find the full release notes [here]. +Additionally, from this release onward, features will be classified as Stable, Beta and Prototype. Prototype features are not included as part of the binary distribution and are instead available through either building from source, using nightlies or via compiler flag. You can learn more about what this change means in the post [here](https://pytorch.org/blog/pytorch-feature-classification-changes/). You can also find the full release notes [here]. # Performance & Profiling From a03b8c372c73d66c90cc34e29c684492b65bb4d3 Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Mon, 27 Jul 2020 10:33:51 -0700 Subject: [PATCH 10/23] Update 2020-7-20-pytorch-1.6-released.md --- _posts/2020-7-20-pytorch-1.6-released.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md index 17783ec866dc..86ac378a9e72 100644 --- a/_posts/2020-7-20-pytorch-1.6-released.md +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -15,7 +15,7 @@ A few of the highlights include: 4. New profiling tools providing tensor-level memory consumption information; 5. Numerous improvements and new features for both distributed data parallel (DDP) training and the remote procedural call (RPC) packages. -Additionally, from this release onward, features will be classified as Stable, Beta and Prototype. Prototype features are not included as part of the binary distribution and are instead available through either building from source, using nightlies or via compiler flag. You can learn more about what this change means in the post [here](https://pytorch.org/blog/pytorch-feature-classification-changes/). You can also find the full release notes [here]. +Additionally, from this release onward, features will be classified as Stable, Beta and Prototype. Prototype features are not included as part of the binary distribution and are instead available through either building from source, using nightlies or via compiler flag. You can learn more about what this change means in the post [here](https://pytorch.org/blog/pytorch-feature-classification-changes/). You can also find the full release notes [here](https://github.com/pytorch/pytorch/releases). # Performance & Profiling From 410e92550925f6819644876b0b694e9a6027aa3e Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Mon, 27 Jul 2020 10:34:42 -0700 Subject: [PATCH 11/23] Update 2020-7-20-pytorch-1.6-released.md --- _posts/2020-7-20-pytorch-1.6-released.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md index 86ac378a9e72..1fbcc4af7b0e 100644 --- a/_posts/2020-7-20-pytorch-1.6-released.md +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -53,7 +53,7 @@ print(example(torch.ones([]))) ## [Beta] Memory Profiler -The `torch.autograd` API now includes a memory profiler that lets you inspect the cost of different operators inside your CPU and GPU models. There are two modes implemented at the moment - CPU-only using '[profile](https://pytorch.org/docs/master/autograd.html#torch.autograd.profiler.profile)'. and nvprof based (registers both CPU and GPU activity) using '[emit_nvtx](https://pytorch.org/docs/master/autograd.html#torch.autograd.profiler.emit_nvtx)'. +The `torch.autograd` API now includes a memory profiler that lets you inspect the cost of different operators inside your CPU and GPU models. There are two modes implemented at the moment - CPU-only using [profile](https://pytorch.org/docs/master/autograd.html#torch.autograd.profiler.profile) and nvprof based (registers both CPU and GPU activity) using [emit_nvtx](https://pytorch.org/docs/master/autograd.html#torch.autograd.profiler.emit_nvtx). Here is an example usage of the API: From d2a3609902ecf2784c5eae006d7f3573311d3052 Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Mon, 27 Jul 2020 13:42:12 -0700 Subject: [PATCH 12/23] Update 2020-7-20-pytorch-1.6-released.md --- _posts/2020-7-20-pytorch-1.6-released.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md index 1fbcc4af7b0e..e7647e38078f 100644 --- a/_posts/2020-7-20-pytorch-1.6-released.md +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -194,13 +194,17 @@ The [2020 CVPR Low-Power Vision Challenge (LPCV) - Online Track for UAV video](h To reiterate, Prototype features in PyTorch are early features that we are looking to gather feedback on, gauge the usefulness of and improve ahead of graduating them to Beta or Stable. The following features are not part of the PyTorch 1.6 release and instead are available in nightlies with separate docs/tutorials to help facilitate early usage and feedback. -**Distributed RPC/Profiler** - Allow users to profile training jobs that use `torch.distributed.rpc` using the autograd profiler, and remotely invoke the profiler in order to collect profiling information across different nodes. The RFC can be found [here](https://github.com/pytorch/pytorch/issues/39675) and a short recipe on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). +#### Distributed RPC/Profiler +Allow users to profile training jobs that use `torch.distributed.rpc` using the autograd profiler, and remotely invoke the profiler in order to collect profiling information across different nodes. The RFC can be found [here](https://github.com/pytorch/pytorch/issues/39675) and a short recipe on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). -**TorchScript Module Freezing:** Module Freezing is the process of inlining module parameters and attributes values into the TorchScript internal representation. Parameter and attribute values are treated as final value and they cannot be modified in the frozen module. The PR for this feature can be found [here](https://github.com/pytorch/pytorch/pull/32178) and a short tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). +#### TorchScript Module Freezing +Module Freezing is the process of inlining module parameters and attributes values into the TorchScript internal representation. Parameter and attribute values are treated as final value and they cannot be modified in the frozen module. The PR for this feature can be found [here](https://github.com/pytorch/pytorch/pull/32178) and a short tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). -**Graph Mode Quantization:** Eager mode quantization requires users to make changes to their model, including explicitly quantizing activations, module fusion, rewriting use of torch ops with Functional Modules and quantization of functionals are not supported. If we can trace or script the model, then the quantization can be done automatically with graph mode quantization without any of the complexities in eager mode, and it is configurable through a `qconfig_dict`. A tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). +#### Graph Mode Quantization +Eager mode quantization requires users to make changes to their model, including explicitly quantizing activations, module fusion, rewriting use of torch ops with Functional Modules and quantization of functionals are not supported. If we can trace or script the model, then the quantization can be done automatically with graph mode quantization without any of the complexities in eager mode, and it is configurable through a `qconfig_dict`. A tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). -**Quantization Numerical Suite:** Quantization is good when it works, but it’s difficult to know what's wrong when it doesn't satisfy the expected accuracy. A prototype is now available for a Numerical Suite that measures comparison statistics between quantized modules and float modules. This is available to test using eager mode and on CPU only with more support coming. A tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). +#### Quantization Numerical Suite +Quantization is good when it works, but it’s difficult to know what's wrong when it doesn't satisfy the expected accuracy. A prototype is now available for a Numerical Suite that measures comparison statistics between quantized modules and float modules. This is available to test using eager mode and on CPU only with more support coming. A tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source). Cheers! From a35e1c019185f827f538395d3d90292f84a15452 Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Mon, 27 Jul 2020 14:41:59 -0700 Subject: [PATCH 13/23] Update 2020-7-20-pytorch-1.6-released.md --- _posts/2020-7-20-pytorch-1.6-released.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md index e7647e38078f..c1e8a778b3a4 100644 --- a/_posts/2020-7-20-pytorch-1.6-released.md +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -68,16 +68,16 @@ print(prof.key_averages().table(sort_by="self_cpu_time_total")) ``` ```python ------------------------------------ --------------- --------------- --------------- -Name Self CPU total CPU time avg Number of Calls CPU Mem ------------------------------------ --------------- --------------- --------------- -mul 32.048ms 32.048ms 200 800 b -pow 27.041ms 27.041ms 200 800 b -PowBackward0 9.727ms 55.483ms 100 -torch::autograd::AccumulateGrad 9.148ms 9.148ms 100 -torch::autograd::GraphRoot 691.816us 691.816us 100 ------------------------------------ --------------- --------------- --------------- - ``` +---------- --------- ------------- --------------- ------- +Name Self CPU total CPU time avg Number of Calls CPU Mem +---------- --------- ------------- --------------- ------- +mul 32.048ms 32.048ms 200 800 b +pow 27.041ms 27.041ms 200 800 b +PowBackward 09.727ms 55.483ms 100 +torch::autograd::AccumulateGrad 9.148ms 9.148ms 100 +torch::autograd::GraphRoot 691.816us 691.816us 100 +---------- --------- ------------- --------------- ------- +``` * Design doc ([Link](https://github.com/pytorch/pytorch/pull/37775)) * Documentation ([Link](https://pytorch.org/docs/stable/autograd.html#profiler)) From 8d200cc945920e88282458350a1052e4b9c174f8 Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Mon, 27 Jul 2020 15:02:02 -0700 Subject: [PATCH 14/23] Update 2020-7-20-pytorch-1.6-released.md --- _posts/2020-7-20-pytorch-1.6-released.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md index c1e8a778b3a4..facf09ab2562 100644 --- a/_posts/2020-7-20-pytorch-1.6-released.md +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -68,15 +68,15 @@ print(prof.key_averages().table(sort_by="self_cpu_time_total")) ``` ```python ----------- --------- ------------- --------------- ------- +------------------------------- --------- ------------- --------------- ------- Name Self CPU total CPU time avg Number of Calls CPU Mem ----------- --------- ------------- --------------- ------- +------------------------------- --------- ------------- --------------- ------- mul 32.048ms 32.048ms 200 800 b pow 27.041ms 27.041ms 200 800 b PowBackward 09.727ms 55.483ms 100 torch::autograd::AccumulateGrad 9.148ms 9.148ms 100 torch::autograd::GraphRoot 691.816us 691.816us 100 ----------- --------- ------------- --------------- ------- +------------------------------- --------- ------------- --------------- ------- ``` * Design doc ([Link](https://github.com/pytorch/pytorch/pull/37775)) From ed7c1db876b1190b70d7b56da97d9d1a9a41855d Mon Sep 17 00:00:00 2001 From: andresruizfacebook <68402331+andresruizfacebook@users.noreply.github.com> Date: Mon, 27 Jul 2020 16:14:37 -0700 Subject: [PATCH 15/23] Create news-item-4.md --- _news/news-item-4.md | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 _news/news-item-4.md diff --git a/_news/news-item-4.md b/_news/news-item-4.md new file mode 100644 index 000000000000..119b00fcacc5 --- /dev/null +++ b/_news/news-item-4.md @@ -0,0 +1,5 @@ +--- +order: 4 +link: https://pytorch.org/blog/pytorch-feature-classification-changes/ +summary: See the new PyTorch feature classifications changes +--- From a80ad4551fad770b339641579b796605753e8b31 Mon Sep 17 00:00:00 2001 From: andresruizfacebook <68402331+andresruizfacebook@users.noreply.github.com> Date: Mon, 27 Jul 2020 16:17:59 -0700 Subject: [PATCH 16/23] Update news-item-3.md --- _news/news-item-3.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_news/news-item-3.md b/_news/news-item-3.md index bef72cfdeea3..c0f135d6fb83 100644 --- a/_news/news-item-3.md +++ b/_news/news-item-3.md @@ -1,6 +1,6 @@ --- order: 3 -link: https://pytorch.org/blog/pytorch-library-updates-new-model-serving-library -summary: PyTorch library updates including new model serving library +link: https://pytorch.org/blog/microsoft-becomes-maintainer-of-the-windows-version-of-pytorch/ +summary: Microsoft becomes maintainer of the Windows version of Pytorch --- From dd1e7c430943f547c3e6abfd94e52e92386c233b Mon Sep 17 00:00:00 2001 From: andresruizfacebook <68402331+andresruizfacebook@users.noreply.github.com> Date: Mon, 27 Jul 2020 16:20:46 -0700 Subject: [PATCH 17/23] Update news-item-2.md --- _news/news-item-2.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_news/news-item-2.md b/_news/news-item-2.md index 40f82ac45446..ea403928ce38 100644 --- a/_news/news-item-2.md +++ b/_news/news-item-2.md @@ -1,6 +1,6 @@ --- order: 2 -link: https://pytorch.org/blog/pytorch-1-dot-5-released-with-new-and-updated-apis -summary: PyTorch 1.5 released, new and updated APIs including C++ frontend API parity with Python. +link: https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/ +summary: Accelerating Training on NVIDIA GPUs with PyTorch Automatic Mixed Precision. --- From 8b51d5c8dc2f545b23261461c9c7087e2e891ac5 Mon Sep 17 00:00:00 2001 From: andresruizfacebook <68402331+andresruizfacebook@users.noreply.github.com> Date: Mon, 27 Jul 2020 16:24:59 -0700 Subject: [PATCH 18/23] Update news-item-1.md --- _news/news-item-1.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_news/news-item-1.md b/_news/news-item-1.md index 29db26f03cc2..2e78daa5caa0 100644 --- a/_news/news-item-1.md +++ b/_news/news-item-1.md @@ -1,7 +1,7 @@ --- order: 1 -link: https://pytorch.org/blog/updates-improvements-to-pytorch-tutorials/ -summary: Click Here to Read About Latest Updates and Improvements to PyTorch Tutorials +link: https://pytorch.org/blog/pytorch-1.6-released/ +summary: PyTorch 1.6 released w/ Native AMP Support, Microsoft joins as maintainers for Windows. --- From b095a4a6e5f13f5819245cd3673c223902826f05 Mon Sep 17 00:00:00 2001 From: andresruizfacebook <68402331+andresruizfacebook@users.noreply.github.com> Date: Mon, 27 Jul 2020 16:26:01 -0700 Subject: [PATCH 19/23] Update news-item-3.md --- _news/news-item-3.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_news/news-item-3.md b/_news/news-item-3.md index c0f135d6fb83..b7b32cde6a07 100644 --- a/_news/news-item-3.md +++ b/_news/news-item-3.md @@ -1,6 +1,6 @@ --- order: 3 link: https://pytorch.org/blog/microsoft-becomes-maintainer-of-the-windows-version-of-pytorch/ -summary: Microsoft becomes maintainer of the Windows version of Pytorch +summary: Microsoft becomes maintainer of the Windows version of PyTorch. --- From 8f0556f626f0c3b465fbbba5e7f01baa0bf35de1 Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Mon, 27 Jul 2020 18:49:47 -0700 Subject: [PATCH 20/23] Update news-item-4.md --- _news/news-item-4.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_news/news-item-4.md b/_news/news-item-4.md index 119b00fcacc5..817a801d062d 100644 --- a/_news/news-item-4.md +++ b/_news/news-item-4.md @@ -1,5 +1,5 @@ --- order: 4 link: https://pytorch.org/blog/pytorch-feature-classification-changes/ -summary: See the new PyTorch feature classifications changes +summary: See the new PyTorch feature classification changes --- From 647f422f6c080b32883dabaa22549d55d118b22f Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Tue, 28 Jul 2020 08:12:40 -0700 Subject: [PATCH 21/23] Updating Mem Profiler content --- _posts/2020-7-20-pytorch-1.6-released.md | 43 ++++++++++++------------ 1 file changed, 22 insertions(+), 21 deletions(-) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-20-pytorch-1.6-released.md index facf09ab2562..e649733944e9 100644 --- a/_posts/2020-7-20-pytorch-1.6-released.md +++ b/_posts/2020-7-20-pytorch-1.6-released.md @@ -53,33 +53,34 @@ print(example(torch.ones([]))) ## [Beta] Memory Profiler -The `torch.autograd` API now includes a memory profiler that lets you inspect the cost of different operators inside your CPU and GPU models. There are two modes implemented at the moment - CPU-only using [profile](https://pytorch.org/docs/master/autograd.html#torch.autograd.profiler.profile) and nvprof based (registers both CPU and GPU activity) using [emit_nvtx](https://pytorch.org/docs/master/autograd.html#torch.autograd.profiler.emit_nvtx). +The `torch.autograd.profiler` API now includes a memory profiler that lets you inspect the tensor memory cost of different operators inside your CPU and GPU models. Here is an example usage of the API: ```python -x = torch.randn((1, 1), requires_grad=True) -with torch.autograd.profiler.profile(profile_memory=True) as prof: - for _ in range(100): # any normal python code, really! - y = x ** 2 - y.backward() - # NOTE: some columns were removed for brevity -print(prof.key_averages().table(sort_by="self_cpu_time_total")) +import torch +import torchvision.models as models +import torch.autograd.profiler as profiler + +model = models.resnet18() +inputs = torch.randn(5, 3, 224, 224) +with profiler.profile(profile_memory=True, record_shapes=True) as prof: + model(inputs) + +# NOTE: some columns were removed for brevity +print(prof.key_averages().table(sort_by="self_cpu_memory_usage", row_limit=10)) +# --------------------------- --------------- --------------- --------------- +# Name CPU Mem Self CPU Mem Number of Calls +# --------------------------- --------------- --------------- --------------- +# empty 94.79 Mb 94.79 Mb 123 +# resize_ 11.48 Mb 11.48 Mb 2 +# addmm 19.53 Kb 19.53 Kb 1 +# empty_strided 4 b 4 b 1 +# conv2d 47.37 Mb 0 b 20 +# --------------------------- --------------- --------------- --------------- ``` -```python -------------------------------- --------- ------------- --------------- ------- -Name Self CPU total CPU time avg Number of Calls CPU Mem -------------------------------- --------- ------------- --------------- ------- -mul 32.048ms 32.048ms 200 800 b -pow 27.041ms 27.041ms 200 800 b -PowBackward 09.727ms 55.483ms 100 -torch::autograd::AccumulateGrad 9.148ms 9.148ms 100 -torch::autograd::GraphRoot 691.816us 691.816us 100 -------------------------------- --------- ------------- --------------- ------- -``` - -* Design doc ([Link](https://github.com/pytorch/pytorch/pull/37775)) +* PR ([Link](https://github.com/pytorch/pytorch/pull/37775)) * Documentation ([Link](https://pytorch.org/docs/stable/autograd.html#profiler)) # Distributed Training & RPC From 2fe1157d589060ddae570b74da2a7d173da2ac69 Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Tue, 28 Jul 2020 08:15:09 -0700 Subject: [PATCH 22/23] Rename 2020-7-20-pytorch-1.6-released.md to 2020-7-28-pytorch-1.6-released.md --- ...-pytorch-1.6-released.md => 2020-7-28-pytorch-1.6-released.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename _posts/{2020-7-20-pytorch-1.6-released.md => 2020-7-28-pytorch-1.6-released.md} (100%) diff --git a/_posts/2020-7-20-pytorch-1.6-released.md b/_posts/2020-7-28-pytorch-1.6-released.md similarity index 100% rename from _posts/2020-7-20-pytorch-1.6-released.md rename to _posts/2020-7-28-pytorch-1.6-released.md From 58f8db1e0695f6029427468bae79db22b923c103 Mon Sep 17 00:00:00 2001 From: Bruce Lin <49162601+brucejlin1@users.noreply.github.com> Date: Tue, 28 Jul 2020 08:30:50 -0700 Subject: [PATCH 23/23] Update 2020-7-28-pytorch-1.6-released.md --- _posts/2020-7-28-pytorch-1.6-released.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2020-7-28-pytorch-1.6-released.md b/_posts/2020-7-28-pytorch-1.6-released.md index e649733944e9..d1a18284dc69 100644 --- a/_posts/2020-7-28-pytorch-1.6-released.md +++ b/_posts/2020-7-28-pytorch-1.6-released.md @@ -149,7 +149,7 @@ print(ret) # prints tensor([3., 3.]) * Tutorial for performant batch RPC using Asynchronous User Functions ([Link](https://github.com/pytorch/tutorials/blob/release/1.6/intermediate_source/rpc_async_execution.rst)) * Documentation ([Link](https://pytorch.org/docs/stable/rpc.html#torch.distributed.rpc.functions.async_execution)) -* Usage examples ([Link](https://github.com/pytorch/examples/tree/stable/distributed/rpc/batch)) +* Usage examples ([Link](https://github.com/pytorch/examples/tree/master/distributed/rpc/batch)) # Frontend API Updates