README.md

Model Inference with Intel® Extension for PyTorch* Optimizations

We provided examples about how to use Intel® Extension for PyTorch* to accelerate model inference. The ipex.optimize function of Intel® Extension for PyTorch* applies optimizations to the model, bringing additional performance boosts. For both computer vision workloads and NLP workloads, we recommend applying the ipex.optimize function against the model object.

Environment Setup

Basically we need to install PyTorch* (along with related packages like torchvision, torchaudio, transformers) and Intel® Extension for PyTorch* in a Python3 environment.

python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
python -m pip install intel-extension-for-pytorch
# For BERT examples, transformers package is required
python -m pip install transformers

For more details, please check installation guide.

Running Example Scripts

We provided inference examples for eager mode as well as graph mode, in which the computational graph is generated via TorchScript or TorchDynamo. Eager mode is the default execution mode in PyTorch, the codes are executed in a “define-by-run” paradigm, so it is flexible, interactive and easy to debug. On the other hand, in graph mode the codes are executed in “define-and-run” paradigm, which means the building of the entire computation graph is required before running the function. During the graph compilation process, optimizations like layer fusion and folding are applied, and the compiled graphs are more friendly for backend optimizations, leading to accelerated execution. TorchScript and TorchDynamo are the 2 graph compiling tools that PyTorch* provides.

From numerical precision perspective, we provided inference examples for BFloat16 and INT8 quantization in addition to the default Float32 precision. Low-precision approaches including Automatic Mixed Precision (AMP) and quantization are commonly used in PyTorch* to improve performance. In addition, BFloat16 and INT8 calculations can be further accelerated by Intel® Advanced Matrix Extensions (AMX) instructions. Please refer to best practices for x86 CPU backend and Leveraging Intel® AMX for detail explanations.

# Clone the repository and access to the inference examples folder
git clone https://github.com/intel/intel-extension-for-pytorch.git
cd intel-extension-for-pytorch/examples/cpu/inference/python

Float32

Running ResNet50 inference in eager mode:

python resnet50_eager_mode_inference_fp32.py

Running ResNet50 inference in TorchScript mode:

python resnet50_torchscript_mode_inference_fp32.py

Running ResNet50 inference in TorchDynamo mode:

python resnet50_torchdynamo_mode_inference_fp32.py

Running BERT inference in eager mode:

python bert_eager_mode_inference_fp32.py

Running BERT inference in TorchScript mode:

python bert_torchscript_mode_inference_fp32.py

Running BERT inference in TorchDynamo mode:

python bert_torchdynamo_mode_inference_fp32.py

BFloat16

Running ResNet50 inference in eager mode:

python resnet50_eager_mode_inference_bf16.py

Running ResNet50 inference in TorchScript mode:

python resnet50_torchscript_mode_inference_bf16.py

Running ResNet50 inference in TorchDynamo mode:

python resnet50_torchdynamo_mode_inference_bf16.py

Running BERT inference in eager mode:

python bert_eager_mode_inference_bf16.py

Running BERT inference in TorchScript mode:

python bert_torchscript_mode_inference_bf16.py

Running BERT inference in TorchDynamo mode:

python bert_torchdynamo_mode_inference_bf16.py

Note: In TorchDynamo mode, since the native PyTorch* operators like aten::convolution and aten::linear are well supported and optimized in ipex backend, we need to disable weights prepacking by setting weights_prepack=False when calling ipex.optimize function.

INT8

We support both static and dynamic INT8 quantization.

Static quantization: both weights and activations are quantized, a calibration process required.
Dynamic quantization: weights quantized, activations read/stored in floating point and quantized for compute.

Please read PyTorch* quantization introduction for a more comprehensive overview.

Running static quantization for pretrained ResNet50 model:

python int8_quantization_static.py

Loading the static quantized model and executing inference:

python int8_deployment.py

Running dynamic quantization for pretrained BERT model:

python int8_quantization_dynamic.py

Please check the model inference examples in Intel® Extension for PyTorch* online document for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python

python

README.md

Model Inference with Intel® Extension for PyTorch* Optimizations

Environment Setup

Running Example Scripts

Float32

BFloat16

INT8

Name		Name	Last commit message	Last commit date
parent directory ..
llm-modeling		llm-modeling
llm		llm
README.md		README.md
bert_eager_mode_inference_bf16.py		bert_eager_mode_inference_bf16.py
bert_eager_mode_inference_fp32.py		bert_eager_mode_inference_fp32.py
bert_general_inference_script.py		bert_general_inference_script.py
bert_torchdynamo_mode_inference_bf16.py		bert_torchdynamo_mode_inference_bf16.py
bert_torchdynamo_mode_inference_fp32.py		bert_torchdynamo_mode_inference_fp32.py
bert_torchscript_mode_inference_bf16.py		bert_torchscript_mode_inference_bf16.py
bert_torchscript_mode_inference_fp32.py		bert_torchscript_mode_inference_fp32.py
int8_deployment.py		int8_deployment.py
int8_quantization_dynamic.py		int8_quantization_dynamic.py
int8_quantization_static.py		int8_quantization_static.py
resnet50_eager_mode_inference_bf16.py		resnet50_eager_mode_inference_bf16.py
resnet50_eager_mode_inference_fp32.py		resnet50_eager_mode_inference_fp32.py
resnet50_general_inference_script.py		resnet50_general_inference_script.py
resnet50_torchdynamo_mode_inference_bf16.py		resnet50_torchdynamo_mode_inference_bf16.py
resnet50_torchdynamo_mode_inference_fp32.py		resnet50_torchdynamo_mode_inference_fp32.py
resnet50_torchscript_mode_inference_bf16.py		resnet50_torchscript_mode_inference_bf16.py
resnet50_torchscript_mode_inference_fp32.py		resnet50_torchscript_mode_inference_fp32.py

Files

python

Directory actions

More options

Directory actions

More options

Latest commit

History

python

Folders and files

parent directory

README.md

Model Inference with Intel® Extension for PyTorch* Optimizations

Environment Setup

Running Example Scripts

Float32

BFloat16

INT8