Skip to content

Commit 8dcec6e

Browse files
Andrew (#1748)
* space addition * add image 3, code color test
1 parent 096ab4a commit 8dcec6e

File tree

1 file changed

+3
-4
lines changed

1 file changed

+3
-4
lines changed

_posts/2024-09-26-pytorch-native-architecture-optimization.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ model \= torchao.autoquant(torch.compile(model, mode='max-autotune'))
4444

4545
quantize\_ API has a few different options depending on whether your model is compute bound or memory bound.
4646

47+
```py
4748
from torchao.quantization import (
4849
\# Memory bound models
4950
int4\_weight\_only,
@@ -57,7 +58,7 @@ from torchao.quantization import (
5758
float8\_weight\_only,
5859
float8\_dynamic\_activation\_float8\_weight,
5960
)
60-
61+
```
6162

6263
We also have extensive benchmarks on diffusion models in collaboration with the HuggingFace diffusers team in [diffusers-torchao](https://github.com/sayakpaul/diffusers-torchao) where we demonstrated 53.88% speedup on Flux.1-Dev and 27.33% speedup on CogVideoX-5b
6364

@@ -73,7 +74,7 @@ But also can do things like quantize weights to int4 and the kv cache to int8 to
7374

7475
Post training quantization, especially at less than 4 bit can suffer from serious accuracy degradations. Using [Quantization Aware Training](https://pytorch.org/blog/quantization-aware-training/) (QAT) we’ve managed to recover up to 96% of the accuracy degradation on hellaswag. We’ve integrated this as an end to end recipe in torchtune with a minimal [tutorial](https://github.com/pytorch/ao/tree/main/torchao/quantization/prototype/qat)
7576

76-
![](/assets/images/Figure_3.png){:style="width:100%"}
77+
![](/assets/images/Figure_3.jpg){:style="width:100%"}
7778

7879
# Training
7980

@@ -116,8 +117,6 @@ We’ve been actively working on making sure torchao works well in some of the m
116117
5. In [torchchat](https://github.com/pytorch/torchchat) for post training quantization
117118
6. In SGLang for for [int4 and int8 post training quantization](https://github.com/sgl-project/sglang/pull/1341)
118119

119-
#
120-
121120
## Conclusion
122121

123122
If you’re interested in making your models faster and smaller for training or inference, we hope you’ll find torchao useful and easy to integrate.

0 commit comments

Comments
 (0)