Skip to content

Commit 6a4f748

Browse files
committed
description, code change
1 parent 79d56de commit 6a4f748

File tree

3 files changed

+14
-18
lines changed

3 files changed

+14
-18
lines changed

index.rst

+7
Original file line numberDiff line numberDiff line change
@@ -679,6 +679,13 @@ Welcome to PyTorch Tutorials
679679
:link: intermediate/transformer_building_blocks.html
680680
:tags: Transformer
681681

682+
.. customcarditem::
683+
:header: PyTorch 2 Export Quantization with Intel GPU Backend through Inductor
684+
:card_description: Learn how to export quantized models on Intel GPU backend
685+
:image: _static/img/thumbnails/cropped/pytorch-logo.png
686+
:link: intermediate/pt2e_quant_xpu_inductor.html
687+
:tags: Quantization,Model-Optimization
688+
682689
.. Parallel-and-Distributed-Training
683690
684691

prototype_source/pt2e_quant_xpu_inductor.rst renamed to intermediate_source/pt2e_quant_xpu_inductor.rst

+7-11
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,15 @@ This tutorial introduces XPUInductorQuantizer aiming for serving the quantized m
1616
utilizes PyTorch 2 Export Quantization flow and lowers the quantized model into the inductor.
1717

1818
The pytorch 2 export quantization flow uses the torch.export to capture the model into a graph and perform quantization transformations on top of the ATen graph.
19-
This approach is expected to have significantly higher model coverage, better programmability, and a simplified UX.
19+
This approach is expected to have significantly higher model coverage with better programmability and a simplified user experience.
2020
TorchInductor is the compiler backend that compiles the FX Graphs generated by TorchDynamo into optimized C++/Triton kernels.
2121

2222
The quantization flow mainly includes three steps:
2323

2424
- Step 1: Capture the FX Graph from the eager Model based on the `torch export mechanism <https://pytorch.org/docs/main/export.html>`_.
25-
- Step 2: Apply the Quantization flow based on the captured FX Graph, including defining the backend-specific quantizer, generating the prepared model with observers,
25+
- Step 2: Apply the quantization flow based on the captured FX Graph, including defining the backend-specific quantizer, generating the prepared model with observers,
2626
performing the prepared model's calibration, and converting the prepared model into the quantized model.
27-
- Step 3: Lower the quantized model into inductor with the API ``torch.compile``.
28-
29-
During Step 3, the inductor would decide which kernels are dispatched into. There are two kinds of kernels the Intel GPU would obtain benefits, oneDNN kernels and triton kernels. `Intel oneAPI Deep Neural Network Library (oneDNN) <https://github.com/uxlfoundation/oneDNN>`_ contains
30-
highly-optimized quantized Conv/GEMM kernels for both CPU and GPU. Furthermore, oneDNN supports extra operator fusion on these operators, like quantized linear with eltwise activation function(ReLU) and binary operation(add, inplace sum).
31-
Besides oneDNN kernels, triton would be responsible for generating kernels on our GPUs, like operators `quantize` and `dequantize`. The triton kernels are optimized by `Intel XPU Backend for Triton <https://github.com/intel/intel-xpu-backend-for-triton>`_
27+
- Step 3: Lower the quantized model into inductor with the API ``torch.compile``, which would call triton kernels or oneDNN GEMM/Conv kernels.
3228

3329

3430
The high-level architecture of this flow could look like this:
@@ -39,13 +35,13 @@ The high-level architecture of this flow could look like this:
3935
Post Training Quantization
4036
----------------------------
4137

42-
Static quantization is the only method we support currently. QAT and dynamic quantization will be available in later versions.
38+
Static quantization is the only method we support currently.
4339

4440
The dependencies packages are recommended to be installed through Intel GPU channel as follows
4541

4642
::
4743

48-
pip install torchvision pytorch-triton-xpu --index-url https://download.pytorch.org/whl/nightly/xpu
44+
pip3 install torch torchvision torchaudio pytorch-triton-xpu --index-url https://download.pytorch.org/whl/xpu
4945

5046
1. Capture FX Graph
5147
^^^^^^^^^^^^^^^^^^^^^
@@ -63,7 +59,7 @@ We will start by performing the necessary imports, capturing the FX Graph from t
6359

6460
# Create the Eager Model
6561
model_name = "resnet18"
66-
model = models.__dict__[model_name](pretrained=True)
62+
models.__dict__[model_name](weights=models.ResNet18_Weights.DEFAULT)
6763

6864
# Set the model to eval mode
6965
model = model.eval().to("xpu")
@@ -75,7 +71,7 @@ We will start by performing the necessary imports, capturing the FX Graph from t
7571

7672
# Capture the FX Graph to be quantized
7773
with torch.no_grad():
78-
export_model = export_for_training(
74+
exported_model = export_for_training(
7975
model,
8076
example_inputs,
8177
).module()

prototype_source/prototype_index.rst

-7
Original file line numberDiff line numberDiff line change
@@ -96,13 +96,6 @@ Prototype features are not available as part of binary distributions like PyPI o
9696
:link: ../prototype/pt2e_quant_x86_inductor.html
9797
:tags: Quantization
9898

99-
.. customcarditem::
100-
:header: PyTorch 2 Export Quantization with Intel GPU Backend through Inductor
101-
:card_description: Learn how to use PT2 Export Quantization with Intel GPU Backend through Inductor.
102-
:image: ../_static/img/thumbnails/cropped/generic-pytorch-logo.png
103-
:link: ../prototype/pt2e_quant_xpu_inductor.html
104-
:tags: Quantization
105-
10699
.. Sparsity
107100
108101
.. customcarditem::

0 commit comments

Comments
 (0)