Skip to content

Commit 4ca2ef7

Browse files
authored
Update and rename 2022-6-22-introducing-torchx-fbgemm-and-other-domain-library-updates-in-pytorch-1-12.md to 2022-6-22-introducing-torchx-fbgemm-and-other-library-updates-in-pytorch-1-12.md
Title change. Movement of sections. Bullet point feedback
1 parent d3e289d commit 4ca2ef7

File tree

1 file changed

+47
-47
lines changed

1 file changed

+47
-47
lines changed
Lines changed: 47 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
layout: blog_detail
3-
title: "Introducing TorchX, FBGemm and other domain library updates in PyTorch 1.12"
3+
title: "Introducing TorchX, FBGemm and other library updates in PyTorch 1.12"
44
author: Team PyTorch
55
featured-img: ''
66
---
@@ -9,9 +9,9 @@ We are introducing the beta release of TorchRec and a number of improvements to
99

1010
Summary:
1111
- **TorchVision** - Added 4 new model families and 14 new classification datasets such as CLEVR, GTSRB, FER2013. See the release notes [here](https://github.com/pytorch/vision/releases).
12-
- **TorchRec** a PyTorch domain library for Recommendation Systems, is available in beta. [View it on GitHub](https://github.com/pytorch/torchrec).
1312
- **TorchAudio** - Added Enformer- and RNN-T-based models and recipes to support the full development lifecycle of a streaming ASR model. See the release notes [here](https://github.com/pytorch/audio/releases).
1413
- **TorchText** - Added beta support for RoBERTa and XLM-R models, byte-level BPE tokenizer, and text datasets backed by TorchData. See the release notes [here](https://github.com/pytorch/text/releases).
14+
- **TorchRec** a PyTorch domain library for Recommendation Systems, is available in beta. [View it on GitHub](https://github.com/pytorch/torchrec).
1515
- **TorchX** - Launch PyTorch trainers developed on local workspaces onto five different types of schedulers. See the release notes [here](https://github.com/pytorch/torchx/blob/main/CHANGELOG.md?plain=1#L3).
1616
- **FBGemm** - Added and improved kernels for Recommendation Systems inference workloads, including table batched embedding bag, jagged tensor operations, and other special-case optimizations.
1717

@@ -202,47 +202,6 @@ This release brings a bunch of new primitives which can be used to produce SOTA
202202

203203
We completely revamped our models documentation to make them easier to browse, and added various key information such as supported image sizes, or image pre-processing steps of pre-trained weights. We now have a [main model page](https://pytorch.org/vision/main/models.html) with various [summary tables](https://pytorch.org/vision/main/models.html#table-of-all-available-classification-weights) of available weights, and each model has a [dedicated page](https://pytorch.org/vision/main/models/resnet.html). Each model builder is also documented in their [own page](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet50.html#torchvision.models.resnet50), with more details about the available weights, including accuracy, minimal image size, link to training recipes, and other valuable info. For comparison, our previous models docs are [here](https://pytorch.org/vision/0.12/models.html). To provide feedback on the new documentation, please use the dedicated [Github issue](https://github.com/pytorch/vision/issues/5511).
204204

205-
## TorchRec v0.1.2
206-
207-
### EmbeddingModule + DLRM benchmarks
208-
209-
A set of [benchmarking tests](https://github.com/pytorch/torchrec/tree/main/benchmarks), showing performance characteristics of TorchRec’s base modules and research models built out of TorchRec.
210-
211-
### TwoTower Retrieval Example, with FAISS
212-
213-
We provide an [example](https://github.com/pytorch/torchrec/tree/main/examples/retrieval) demonstrating training a distributed TwoTower (i.e. User-Item) Retrieval model that is sharded using TorchRec. The projected item embeddings are added to an IVFPQ FAISS index for candidate generation. The retrieval model and KNN lookup are bundled in a Pytorch model for efficient end-to-end retrieval.
214-
215-
### Integrations
216-
217-
We demonstrate that TorchRec works out of the box with many components commonly used alongside PyTorch models in production like systems, such as
218-
- [Training](https://github.com/pytorch/torchrec/tree/main/examples/ray) a TorchRec model on Ray Clusters utilizing the Torchx Ray scheduler
219-
- [Preprocessing](https://github.com/pytorch/torchrec/tree/main/torchrec/datasets/scripts/nvt) and DataLoading with NVTabular on DLRM
220-
- [Training](https://github.com/pytorch/torchrec/tree/main/examples/torcharrow) a TorchRec model with on-the-fly preprocessing with TorchArrow showcasing RecSys domain UDFs.
221-
222-
### Sequential Embeddings Example: Bert4Rec
223-
224-
We provide an [example](https://github.com/pytorch/torchrec/tree/main/examples/bert4rec), using TorchRec, that reimplements the [BERT4REC](https://arxiv.org/abs/1904.06690) paper, showcasing EmbeddingCollection for non-pooled embeddings. Using DistributedModelParallel we see a 35% QPS gain over conventional data parallelism.
225-
226-
### (Beta) Planner
227-
228-
The TorchRec library includes a built-in [planner](https://pytorch.org/torchrec/torchrec.distributed.planner.html) that selects near optimal sharding plan for a given model. The planner attempts to identify the best sharding plan by evaluating a series of proposals which are statically analyzed and fed into an integer partitioner. The planner is able to automatically adjust plans for a wide range of hardware setups, allowing users to scale performance seamlessly from local development environment to large scale production hardware. See this [notebook](https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/planner/Planner_Introduction.ipynb) for a more detailed tutorial.
229-
230-
### (Beta) Inference
231-
232-
[TorchRec Inference](https://github.com/pytorch/torchrec/tree/main/torchrec/inference) is a C++ library that supports multi-gpu inference. The TorchRec library is used to shard models written and packaged in Python via torch.package (an alternative to TorchScript). The torch.deploy library is used to serve inference from C++ by launching multiple Python interpreters carrying the packaged model, thus subverting the GIL. Two models are provided as examples: [DLRM multi-GPU](https://github.com/pytorch/torchrec/blob/main/examples/inference/dlrm_predict.py) (sharded via TorchRec) and [DLRM single-GPU](https://github.com/pytorch/torchrec/blob/main/examples/inference/dlrm_predict_single_gpu.py).
233-
234-
### (Beta) RecMetrics
235-
236-
RecMetrics is a [metrics](https://github.com/pytorch/torchrec/tree/main/torchrec/metrics) library that collects common utilities and optimizations for Recommendation models. It extends [torchmetrics](https://torchmetrics.readthedocs.io/en/stable/).
237-
- A centralized metrics module that allows users to add new metrics
238-
- Commonly used metrics, including AUC, Calibration, CTR, MSE/RMSE, NE & Throughput
239-
- Optimization for metrics related operations to reduce the overhead of metric computation
240-
- Checkpointing
241-
242-
### (Prototype) Single process Batched + Fused Embeddings
243-
244-
Previously TorchRec’s abstractions (EmbeddingBagCollection/EmbeddingCollection) over FBGEMM kernels, which provide benefits such as table batching, optimizer fusion, and UVM placement, could only be used in conjunction with DistributedModelParallel. We’ve decoupled these notions from sharding, and introduced the [FusedEmbeddingBagCollection](https://github.com/pytorch/torchrec/blob/eb1247d8a2d16edc4952e5c2617e69acfe5477a5/torchrec/modules/fused_embedding_modules.py#L271), which can be used as a standalone module, with all of the above features, and can also be sharded.
245-
246205
## TorchAudio v0.12
247206

248207
### (BETA) Streaming API
@@ -259,7 +218,7 @@ StreamReader is TorchAudio’s new I/O API. It is backed by FFmpeg†, and provi
259218
- Apply various audio and video filters, such as low-pass filter and image scaling.
260219
- Decode video with NVidia's hardware-based decoder (NVDEC).
261220

262-
For the detail of the usage, please checkout the [documentation](https://pytorch.org/audio/0.12.0/io.html#streamreader) and the tutorials
221+
For the detail of the usage, please checkout the [documentation](https://pytorch.org/audio/0.12.0/io.html#streamreader) and the tutorials:
263222
- [Media Stream API - Pt.1](https://pytorch.org/audio/0.12.0/tutorials/streaming_api_tutorial.html)
264223
- [Media Stream API - Pt.2](https://pytorch.org/audio/0.12.0/tutorials/streaming_api2_tutorial.html)
265224
- [Online ASR with Emformer RNN-T](https://pytorch.org/audio/0.12.0/tutorials/online_asr_tutorial.html)
@@ -279,8 +238,8 @@ For details of usage, please check out the [documentation](https://pytorch.org/a
279238
### (BETA) New Beamforming Modules and Methods
280239

281240
TorchAudio adds two new beamforming modules under torchaudio.transforms: [SoudenMVDR](https://pytorch.org/audio/0.12.0/transforms.html#soudenmvdr) and [RTFMVDR](https://pytorch.org/audio/0.12.0/transforms.html#rtfmvdr), to increase the flexibility of module usage. The main differences from the [torchaudio.transforms.MVDR](https://pytorch.org/audio/0.11.0/transforms.html#mvdr) module are:
282-
- Use power spectral density (PSD) matrices or the relative transfer function (RTF) matrix as inputs instead of time-frequency masks. The module can be integrated with neural networks that directly predict complex-valued STFT coefficients of speech and noise.
283-
- Add reference_channel to the input arguments in the forward method, to let users select the reference microphone during model training.
241+
- Use power spectral density (PSD) matrices or the relative transfer function (RTF) matrix as inputs instead of time-frequency masks. The module can be integrated with neural networks that directly predict complex-valued STFT coefficients of speech and noise
242+
- Add reference_channel to the input arguments in the forward method, to let users select the reference microphone during model training
284243

285244
Besides the two modules, new function-level beamforming methods are added under torchaudio.functional. These include:
286245
- [psd](https://pytorch.org/audio/0.12.0/functional.html#psd)
@@ -316,6 +275,47 @@ TorchScriptabilty support would allow users to embed the BERT text-preprocessing
316275

317276
For usage details, please refer to the corresponding [documentation](https://pytorch.org/text/main/transforms.html#torchtext.transforms.BERTTokenizer).
318277

278+
## TorchRec v0.2.0
279+
280+
### EmbeddingModule + DLRM benchmarks
281+
282+
A set of [benchmarking tests](https://github.com/pytorch/torchrec/tree/main/benchmarks), showing performance characteristics of TorchRec’s base modules and research models built out of TorchRec.
283+
284+
### TwoTower Retrieval Example, with FAISS
285+
286+
We provide an [example](https://github.com/pytorch/torchrec/tree/main/examples/retrieval) demonstrating training a distributed TwoTower (i.e. User-Item) Retrieval model that is sharded using TorchRec. The projected item embeddings are added to an IVFPQ FAISS index for candidate generation. The retrieval model and KNN lookup are bundled in a Pytorch model for efficient end-to-end retrieval.
287+
288+
### Integrations
289+
290+
We demonstrate that TorchRec works out of the box with many components commonly used alongside PyTorch models in production like systems, such as
291+
- [Training](https://github.com/pytorch/torchrec/tree/main/examples/ray) a TorchRec model on Ray Clusters utilizing the Torchx Ray scheduler
292+
- [Preprocessing](https://github.com/pytorch/torchrec/tree/main/torchrec/datasets/scripts/nvt) and DataLoading with NVTabular on DLRM
293+
- [Training](https://github.com/pytorch/torchrec/tree/main/examples/torcharrow) a TorchRec model with on-the-fly preprocessing with TorchArrow showcasing RecSys domain UDFs
294+
295+
### Sequential Embeddings Example: Bert4Rec
296+
297+
We provide an [example](https://github.com/pytorch/torchrec/tree/main/examples/bert4rec), using TorchRec, that reimplements the [BERT4REC](https://arxiv.org/abs/1904.06690) paper, showcasing EmbeddingCollection for non-pooled embeddings. Using DistributedModelParallel we see a 35% QPS gain over conventional data parallelism.
298+
299+
### (Beta) Planner
300+
301+
The TorchRec library includes a built-in [planner](https://pytorch.org/torchrec/torchrec.distributed.planner.html) that selects near optimal sharding plan for a given model. The planner attempts to identify the best sharding plan by evaluating a series of proposals which are statically analyzed and fed into an integer partitioner. The planner is able to automatically adjust plans for a wide range of hardware setups, allowing users to scale performance seamlessly from local development environment to large scale production hardware. See this [notebook](https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/planner/Planner_Introduction.ipynb) for a more detailed tutorial.
302+
303+
### (Beta) Inference
304+
305+
[TorchRec Inference](https://github.com/pytorch/torchrec/tree/main/torchrec/inference) is a C++ library that supports multi-gpu inference. The TorchRec library is used to shard models written and packaged in Python via torch.package (an alternative to TorchScript). The torch.deploy library is used to serve inference from C++ by launching multiple Python interpreters carrying the packaged model, thus subverting the GIL. Two models are provided as examples: [DLRM multi-GPU](https://github.com/pytorch/torchrec/blob/main/examples/inference/dlrm_predict.py) (sharded via TorchRec) and [DLRM single-GPU](https://github.com/pytorch/torchrec/blob/main/examples/inference/dlrm_predict_single_gpu.py).
306+
307+
### (Beta) RecMetrics
308+
309+
RecMetrics is a [metrics](https://github.com/pytorch/torchrec/tree/main/torchrec/metrics) library that collects common utilities and optimizations for Recommendation models. It extends [torchmetrics](https://torchmetrics.readthedocs.io/en/stable/).
310+
- A centralized metrics module that allows users to add new metrics
311+
- Commonly used metrics, including AUC, Calibration, CTR, MSE/RMSE, NE & Throughput
312+
- Optimization for metrics related operations to reduce the overhead of metric computation
313+
- Checkpointing
314+
315+
### (Prototype) Single process Batched + Fused Embeddings
316+
317+
Previously TorchRec’s abstractions (EmbeddingBagCollection/EmbeddingCollection) over FBGEMM kernels, which provide benefits such as table batching, optimizer fusion, and UVM placement, could only be used in conjunction with DistributedModelParallel. We’ve decoupled these notions from sharding, and introduced the [FusedEmbeddingBagCollection](https://github.com/pytorch/torchrec/blob/eb1247d8a2d16edc4952e5c2617e69acfe5477a5/torchrec/modules/fused_embedding_modules.py#L271), which can be used as a standalone module, with all of the above features, and can also be sharded.
318+
319319
## TorchX v0.2.0
320320

321321
TorchX is a job launcher that makes it easier to run PyTorch in distributed training clusters with many scheduler integrations including Kubernetes and Slurm. We're excited to release TorchX 0.2.0 with a number of improvements. TorchX is currently being used in production in both on-premise and cloud environments.
@@ -346,7 +346,7 @@ TorchX [integrates with Ax](https://ax.dev/versions/latest/api/runners.html#modu
346346

347347
TorchX now supports [remote filesystem mounts and custom devices](https://pytorch.org/torchx/main/specs.html#mounts). This enables your PyTorch jobs to efficiently access cloud storage such as NFS or Lustre. The device mounts enables usage of network accelerators like Infiniband and custom inference/training accelerators.
348348

349-
## FBGemm v0.1.2
349+
## FBGemm v0.2.0
350350

351351
The FBGEMM library contains optimized kernels meant to improve the performance of PyTorch workloads. We’ve added a number of new features and optimizations over the last few months that we are excited to report.
352352

0 commit comments

Comments
 (0)