AMD Vitis™ AI Flow Overview

AMD Vitis™ AI Flow Overview#

Reference Design Architecture

The Vitis AI flow supports models saved in the ONNX format and uses ONNX Runtime or Vitis AI Runtime (VART) as the mechanism to load, compile, and execute models.

Note

Models with ONNX opset 20 are recommended. If your model uses a different opset version, consider converting it

using the ONNX Version Converter.

As shown in the diagram, the Vitis AI flow consists of three phases:

  • Model Quantization

  • Model Compilation

  • Model Execution

Model Quantization and Compilation#

You perform model quantization and compilation on a Linux workstation using the tools included in the Vitis AI Docker image. The input is a pre-trained ONNX model. You can optionally quantize this model to INT8 using AMD Quark, a comprehensive cross-platform deep learning toolkit designed to simplify and enhance the quantization of deep learning models.

The model is then compiled by initializing an ONNX Runtime inference session with the Vitis AI Execution Provider, which analyzes the model, determines the subgraphs to run on the NPU, compiles them, and generates the necessary binary files.

Note

During model compilation, FP32 models are automatically converted to BF16. If you want to use INT8 instead, you must first quantize the model with Quark and then compile it.

Model Execution Options#

For model execution, Vitis AI supports multiple runtime options:

  • ONNX Runtime Python APIs: Ideal for prototyping, rapid development, and Python-based applications

  • ONNX Runtime C++ APIs: For performance-critical applications requiring C++ integration

  • Vitis AI Runtime (VART) C++ APIs: Low-level runtime providing maximum control and optimized performance

Model execution is performed on the target hardware board. You copy the compiled model to the board’s Linux filesystem and deploy it using your chosen runtime. This guide demonstrates the complete workflow using ONNX Runtime Python as the primary example, with detailed information about C++ and VART alternatives provided in their respective sections.

The following sections of this guide provide detailed information about all these phases.