Skip to content

Commit c9b745f

Browse files
authored
Merge pull request #1423 from LF-Engineering/8-6
Added members and blog post
2 parents 39f16c0 + 93a41f5 commit c9b745f

21 files changed

+133
-10
lines changed

_board_info/advanced-micro-devices.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: AMD
33
summary: ''
44
link: https://amd.com
5-
image: /assets/images/announcement-logo-amd.jpg
5+
image: /assets/images/members/amd-logo.svg
66
class: pytorch-resource
77
order: 1
88
featured-home: true

_board_info/amazon-web-services.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Amazon
33
summary: ''
44
link: https://aws.amazon.com
5-
image: /assets/images/announcement-logo-aws.jpg
5+
image: /assets/images/members/aws-logo.svg
66
class: pytorch-resource
77
order: 2
88
featured-home: true

_board_info/google-cloud.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Google Cloud
33
summary: ''
44
link: https://cloud.google.com/gcp
5-
image: /assets/images/announcement-logo-google.png
5+
image: /assets/images/members/google-cloud-logo.svg
66
class: pytorch-resource
77
order: 3
88
featured-home: true

_board_info/hugging-face.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
title: Hugging Face
3+
summary: ''
4+
link: https://huggingface.co/
5+
image: /assets/images/members/hf-logo.svg
6+
class: pytorch-resource
7+
order: 4
8+
featured-home: true
9+
---

_board_info/ibm.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
title: IBM
3+
summary: ''
4+
link: https://www.ibm.com/
5+
image: /assets/images/members/ibm-logo.svg
6+
class: pytorch-resource
7+
order: 5
8+
featured-home: true
9+
---

_board_info/meta.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
title: Meta
33
summary: ''
44
link: https://meta.com
5-
image: /assets/images/announcement-logo-meta.jpg
5+
image: /assets/images/members/meta-logo.svg
66
class: pytorch-resource
7-
order: 4
7+
order: 6
88
featured-home: true
99
---

_board_info/microsoft-corporation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
title: Microsoft
33
summary: ''
44
link: https://azure.microsoft.com
5-
image: /assets/images/announcement-logo-microsoft.jpg
5+
image: /assets/images/members/microsoft-azure-logo.svg
66
class: pytorch-resource
7-
order: 5
7+
order: 7
88
featured-home: true
99
---

_board_info/nvidia-corporation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
title: Nvidia
33
summary: ''
44
link: https://www.nvidia.com/en-us/ai-data-science/
5-
image: /assets/images/announcement-logo-nvidia.jpg
5+
image: /assets/images/members/nvidia-logo.svg
66
class: pytorch-resource
7-
order: 5
7+
order: 8
88
featured-home: true
99
---
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
---
2+
layout: blog_detail
3+
title: "INT8 Quantization for x86 CPU in PyTorch"
4+
author: Intel
5+
---
6+
7+
## Overview
8+
9+
INT8 quantization is a powerful technique for speeding up deep learning inference on x86 CPU platforms. By reducing the precision of the model's weights and activations from 32-bit floating-point (FP32) to 8-bit integer (INT8), INT8 quantization can significantly improve the inference speed and reduce memory requirements without sacrificing accuracy.
10+
11+
In this blog, we will discuss the recent progress on INT8 quantization for x86 CPU in PyTorch, focusing on the new x86 quantization backend. We will also briefly look at the new quantization path with PyTorch 2.0 Export (PT2E) and TorchInductor.
12+
13+
14+
## X86 Quantization Backend
15+
16+
The current recommended way of quantization in PyTorch is [FX](http://pytorch.org/tutorials/prototype/fx_graph_mode_quant_guide.html?highlight=fx). Before PyTorch 2.0, the default quantization backend (a.k.a. QEngine) on x86 CPUs was FBGEMM, which leveraged the FBGEMM performance library to achieve the performance speedup. In the PyTorch 2.0 release, a new quantization backend called X86 was introduced to replace FBGEMM. The x86 quantization backend offers improved INT8 inference performance when compared to the original FBGEMM backend by leveraging the strengths of both FBGEMM and the [Intel® oneAPI Deep Neural Network Library (oneDNN)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onednn.html) kernel libraries.
17+
18+
19+
## Performance Benefit from X86 Backend
20+
21+
To measure the performance benefits of the new X86 backend, we ran INT8 inference on 69 popular deep learning models (shown in **Figures 1-3** below) using [4th Gen Intel® Xeon® Scalable processors](https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/platform.html). The results showed a 2.97X geomean performance speedup compared to FP32 inference performance, while the speedup was 1.43X with the FBGEMM backend. The charts below show the per-model performance speedup comparing the x86 backend and the FBGEMM backend.
22+
23+
![Figure 1: Models with less than 2x performance boost with x86 backend1](/assets/images/int8/pytorch_quant_x86_1.jpg){:style="width:100%;"}
24+
25+
<small style="line-height: 1.1"><em>**Figure 1**: Models with less than 2x performance boost with x86 backend1</em></small>
26+
27+
28+
29+
![Figure 2: Models with 2x-4x performance boost with x86 backend1](/assets/images/int8/pytorch_quant_x86_2.jpg){:style="width:100%; margin-top: 4em;"}
30+
31+
<small style="line-height: 1.1"><em>**Figure 2**: Models with 2x-4x performance boost with x86 backend1</em></small>
32+
33+
34+
35+
![Figure 3: Models with larger than 4x performance boost with x86 backend1](/assets/images/int8/pytorch_quant_x86_3.jpg){:style="width:100%; margin-top: 4em;"}
36+
37+
<small style="line-height: 1.1"><em>**Figure 3**: Models with larger than 4x performance boost with x86 backend1</em></small>
38+
39+
40+
## Usage of x86 Backend
41+
42+
By default in 2.0, users on x86 platforms will use the x86 quantization backend and their PyTorch programs will remain unchanged when using the default backend. Alternatively, users can specify x86 as the quantization backend explicitly. \
43+
Below is an example code snippet of PyTorch static post-training quantization with x86 quantization backend.
44+
45+
46+
```
47+
import torch
48+
from torch.ao.quantization import get_default_qconfig_mapping
49+
from torch.quantization.quantize_fx import prepare_fx, convert_fx
50+
51+
qconfig_mapping = get_default_qconfig_mapping()
52+
# Or explicity specify the qengine
53+
# qengine = 'x86'
54+
# torch.backends.quantized.engine = qengine
55+
# qconfig_mapping = get_default_qconfig_mapping(qengine)
56+
57+
model_fp32 = MyModel().eval()
58+
x = torch.randn((1, 3, 224, 224), dtype=torch.float)
59+
x = x.to(memory_format=torch.channels_last)
60+
61+
# Insert observers according to qconfig and backend config
62+
prepared_model = prepare_fx(model_fp32, qconfig_mapping, example_inputs=x)
63+
64+
# Calibration code not shown
65+
66+
# Convert to quantized model
67+
quantized_model = convert_fx(prepared_model)
68+
```
69+
70+
71+
72+
## Technical Details of x86 Backend
73+
74+
We devised heuristic dispatching rules according to the performance numbers from the models we benchmarked to decide whether to invoke oneDNN or FBGEMM performance library to execute the convolution or matrix multiplication operations. The rules are a combination of operation kinds, shapes, CPU architecture information, etc. Detailed logic is available [here](http://github.com/pytorch/pytorch/blob/93ff71ec37e3c946603600a46edef70b42f81213/aten/src/ATen/native/quantized/cpu/OnednnUtils.h#L396). For more design and technical discussion, please refer to the [Request for Comments](http://github.com/pytorch/pytorch/issues/83888).
75+
76+
77+
## Next Steps With a New Quantization Path PyTorch 2.0 Export
78+
79+
Although still far from finalized, a new quantization path, PyTorch 2.0 Export (PT2E), is in early design and PoC stage. The new approach is slated to replace the FX quantization path in the future. It is built upon the capabilities of TorchDynamo Export, a feature introduced in the PyTorch 2.0 release for FX graph capturing. This graph is then quantized and lowered to different backends. TorchInductor, the new DL compiler of PyTorch, has shown promising results in terms of FP32 inference speedup on x86 CPU. We are working actively to enable it as one of the quantization backends of PT2E. We believe the new path will lead to further improvements in INT8 inference performance due to more flexibility of fusion at different levels.
80+
81+
82+
## Conclusion
83+
84+
The x86 backend introduced in PyTorch 2.0 release has demonstrated a remarkable improvement in INT8 inference speed on x86 CPU platforms. It offers a 1.43X speedup compared to the original FBGEMM backend while maintaining backward compatibility. This enhancement can benefit end users with minimal or no modifications to their programs. Furthermore, a new quantization path, PT2E, is currently in development and is expected to provide even more possibilities in the future.
85+
86+
87+
## Acknowledgement
88+
89+
Special thanks to Nikita Shulga, Vasiliy Kuznetsov, Supriya Rao, and Jongsoo Park. Together, we made one more step forward on the path of improving the PyTorch CPU ecosystem.
90+
91+
92+
## Configuration
93+
94+
<sup>1</sup> AWS EC2 r7iz.metal-16xl instance (Intel(R) Xeon(R) Gold 6455B, 32-core/64-thread, Turbo Boost On, Hyper-Threading On, Memory: 8x64GB, Storage: 192GB); OS: Ubuntu 22.04.1 LTS; Kernel: 5.15.0-1028-aws; Batch Size: 1; Core per Instance: 4; PyTorch 2.0 RC3; TorchVision 0.15.0+cpu, test by Intel on 3/77/2023. May not reflect all publicly available security updates.

_sass/announcement.scss

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,10 @@
8888
width: 100%;
8989
height: 207px;
9090
object-fit: contain;
91-
padding: 20px 0;
91+
padding: 20px;
92+
@media screen and (min-width: 1000px) {
93+
padding: 30px;
94+
}
9295
}
9396
}
9497
}
136 KB
Loading
125 KB
Loading
158 KB
Loading

assets/images/members/amd-logo.svg

Lines changed: 1 addition & 0 deletions
Loading

assets/images/members/aws-logo.svg

Lines changed: 1 addition & 0 deletions
Loading
Lines changed: 1 addition & 0 deletions
Loading

assets/images/members/hf-logo.svg

Lines changed: 1 addition & 0 deletions
Loading

assets/images/members/ibm-logo.svg

Lines changed: 1 addition & 0 deletions
Loading

0 commit comments

Comments
 (0)