You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: prototype_source/openvino_quantizer.rst
+4-4
Original file line number
Diff line number
Diff line change
@@ -15,18 +15,18 @@ Introduction
15
15
16
16
This is an experimental feature, the quantization API is subject to change.
17
17
18
-
This tutorial demonstrates how to use `OpenVINOQuantizer` from `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf/tree/develop>`_ in PyTorch 2 Export Quantization flow to generate a quantized model customized for the `OpenVINO torch.compile backend <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ and explains how to lower the quantized model into the `OpenVINO <https://docs.openvino.ai/2024/index.html>`_ representation.
19
-
`OpenVINOQuantizer` unlocks the full potential of low-precision OpenVINO kernels due to the placement of quantizers designed specifically for the OpenVINO.
18
+
This tutorial demonstrates how to use ``OpenVINOQuantizer`` from `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf/tree/develop>`_ in PyTorch 2 Export Quantization flow to generate a quantized model customized for the `OpenVINO torch.compile backend <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ and explains how to lower the quantized model into the `OpenVINO <https://docs.openvino.ai/2024/index.html>`_ representation.
19
+
``OpenVINOQuantizer`` unlocks the full potential of low-precision OpenVINO kernels due to the placement of quantizers designed specifically for the OpenVINO.
20
20
21
-
The PyTorch 2 export quantization flow uses the torch.export to capture the model into a graph and performs quantization transformations on top of the ATen graph.
21
+
The PyTorch 2 export quantization flow uses ``torch.export`` to capture the model into a graph and performs quantization transformations on top of the ATen graph.
22
22
This approach is expected to have significantly higher model coverage, improved flexibility, and a simplified UX.
23
23
OpenVINO backend compiles the FX Graph generated by TorchDynamo into an optimized OpenVINO model.
24
24
25
25
The quantization flow mainly includes four steps:
26
26
27
27
- Step 1: Capture the FX Graph from the eager Model based on the `torch export mechanism <https://pytorch.org/docs/main/export.html>`_.
28
28
- Step 2: Apply the PyTorch 2 Export Quantization flow with OpenVINOQuantizer based on the captured FX Graph.
29
-
- Step 3: Lower the quantized model into OpenVINO representation with the API `torch.compile <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_.
29
+
- Step 3: Lower the quantized model into OpenVINO representation with the `torch.compile <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ API.
30
30
- Optional step 4: : Improve quantized model metrics via `quantize_pt2e <https://openvinotoolkit.github.io/nncf/autoapi/nncf/experimental/torch/fx/index.html#nncf.experimental.torch.fx.quantize_pt2e>`_ method.
31
31
32
32
The high-level architecture of this flow could look like this:
0 commit comments