Vitis AI 5.1 - Beta Release#

Note

Vitis AI 5.1 is the first public release of Vitis AI with Neural Processing Unit (NPU) support. It replaces the Deep Learning Processing Unit (DPU) architecture. This release is available as a Beta version, targeting Versal AI Edge Series Adaptive SoCs. The production release is scheduled for Q1 2026. For support, please contact your local AMD sales representative or post your question to the Vitis AI and AI Community Forums.

Vitis AI 5.1 Developer Guide#

Vitis AI (Product) Overview#

AMD Vitis™ AI is an IDE (Integrated Development Environment) that you can leverage to accelerate AI Inference on AMD’s Adaptive SoCs and FPGAs. The IDE provides optimized IP (Intellectual Property), supporting tools, libraries, models, reference designs, and tutorials that aid you throughout the development process. It is designed to provide high efficiency and ease of use, unlocking the full potential of AI acceleration.

_images/vitis_ai_high_level_block_diagram.png

Vitis AI Integrated Development Environment Block Diagram

Key Components of Vitis AI#

The Vitis AI solution consists of three primary components:

Neural Processing Unit (NPU) IP: A purpose-built AI Inference IP that leverages a combination of Programmable Logic and the AI Engine Array to accelerate the deployment of neural networks.

Model Compilation Tools: A set of tools to quantize, compile, and optimize ML models for NPU IP.

Model Deployment APIs: A collection of setup scripts, examples, and reference designs to integrate and execute ML inference models from a software application.

NPU IP#

AMD uses the acronym NPU IP to identify the “soft” accelerators that facilitate deep-learning inference. The NPU IP uses a combination of AI Engines (AIE) and Programmable Logic (PL) to implement the inference accelerator.

Vitis AI provides NPU IP and aiding tools to deploy both standard and custom neural networks on AMD’s adaptable targets.

The Vitis AI NPU IP operates as a general-purpose AI inference accelerator. Multiple NN models can be loaded and run concurrently on a single NPU. You can instantiate multiple NPU IP instances per device and scale the NPU IP in size to accommodate your requirements.

The Vitis AI NPU IP architecture is called a “Matrix of (Heterogeneous) Processing Engines.” Although the Vitis AI NPU IP architecture might bear some visual resemblance to a systolic array at first glance, the comparison ends beyond visual similarities. The NPU IP operates as a micro-coded processor with its own Instruction Set Architecture. Each NPU IP architecture has its own instruction set.

The Vitis AI Compiler, in collaboration with the NPU IP software stack, generates snapshots tailored for the deployment of each network. The snapshot contains a quantized model and instructions for execution by NPU IP on the target platform.

Note

One advantage of this architecture is that you do not need to load a new bitstream or build a new hardware platform to change the neural network. It is an important differentiator from the regular data flow accelerator architectures which are purpose-built for a single network.

Model Compilation Toolset#

Vitis AI Quantizer

The Vitis AI Quantizer integrated as a component of either TensorFlow or PyTorch converts 32-bit floating-point weights and activations to narrower datatypes such as INT8. It reduces the computing complexity with minimal loss of accuracy (about 1%). Executing the fixed-point model requires less memory bandwidth and thus provides higher throughput and better power efficiency than the 32-bit floating-point model.

Vitis AI Compiler

The Vitis AI Compiler maps the quantized model to a highly efficient instruction set and dataflow model. The compiler performs multiple optimizations; for example, batch normalization operations are fused with convolution when the convolution operator precedes the batch normalization operator. The NPU IP supports multiple dimensions of parallelism. So, efficient instruction scheduling is the key to exploiting the inherent parallelism and its full potential for data reuse in the graph.

Model Deployment APIs#

Vitis AI Runtime (VART) is a set of API functions that support the integration of the NPU IP into software applications. VART is built on top of the legacy Xilinx Runtime (XRT) and provides a unified high-level runtime for embedded targets. Key features include:

Asynchronous submission of jobs to the NPU IP.

Asynchronous collection of jobs from the NPU IP.

C++ and Python API implementations.

Support for multi-threaded execution.

Salient Features of Vitis AI#

AIE/PL Programmability
Low Latency / Real-time AI Inference
Low Power Consumption
Deep Learning Frameworks: PyTorch, TensorFlow
Broad CNN Model Coverage
Data Type: INT8, BF16
C++ and Python APIs for easier integration

Workflow and Components#

This section provides an overview of how developers can deploy models on AMD embedded platforms.

Development flow with Vitis AI: 100 Ft View

The figure outlines the process for deploying a machine learning model using Vitis AI across different hardware environments based on the execution of embedded platforms. The initial step involves setting up the IT environment with the necessary AMD hardware and software. After training the ML model, the next step is to verify its performance on CPU/GPU platforms. This involves running the inference on an x86 host and ensuring accuracy. If the initial accuracy is not satisfactory, the model might need to be retrained or fine-tuned with Vitis AI tools. After the accuracy is validated, proceed with the deployment by choosing embedded execution for integrated systems.

The embedded execution process involves three steps:

Model compilation
Design
Embedded execution

The process starts with model compilation. In this step, the trained/your model is compiled on an x86 host machine using the NPU compiler software. This step also includes assessing the accuracy of the model. If the post-quantized accuracy is not satisfactory, the model is tuned with software APIs. After the accuracy is satisfactory, the the compiler software compiles the model, generating a snapshot file and CPU sub-graphs. By default, the system does not accelerate these sub-graphs on FPGA. The snapshot file is the compiled model plus some instructions for runtime software, packaged together into one file. Refer to the following figure for model compilation flow.

Model Compilation – Model Compilation Flow

This step involves integrating the NPU IP into a full-chip FPGA design alongside other IPs/kernels (such as pre- and post-processing), and using AMD Vitis and Vivado to compile the design and generate binaries for flashing to an SD card.

There are two options for design:

Vitis IDE
Vivado IDE

You can build a full-chip FPGA design using Vitis IDE, integrating it with VSS (Vitis Sub-Systems) and Vitis kernels for custom IPs. If you have RTL IP, it can be kernelized in Vitis and integrated into the design. Once the integrated design is ready, you can work on compilation, linking, packaging, and generating the binary using the Vitis IDE. Refer to the following figure for the Vitis IDE design flow.

In the Vivado IDE flow, Vitis integrates the VSS and your Vitis kernel IPs and link them together. The following figure shows how to export the linking output to Vivado. Now you can integrate your custom RTL IPs into the design, build the complete design, and generate the binary using the Vivado IDE. Refer to the following figure for the Vivado IDE design flow.

Note

The VSS (Vitis Sub-System) is a combination of AIE configuration for ML inference and Kernelized netlist of PL logic utilized for ML inference.
The model preparation and design steps are independent, and you can work on these steps in parallel. Complete these two steps before you proceed to the final step, embedded execution.

The final step is embedded execution. It involves preparing the board, copying the input videos/images, snapshots, and sub-graphs (generated in the first step) to the SD card. Then, use the application software with Vitis AI runtime APIs to execute the model/snapshot and generate inference results on the target. The following figure outlines the embedded execution workflow steps.

Design – Embedded Execution Workflow: Consolidated

Vitis AI 5.1 - Beta Release

Contents

Vitis AI 5.1 - Beta Release#

Vitis AI 5.1 Developer Guide#

Vitis AI (Product) Overview#

Key Components of Vitis AI#

NPU IP#

Model Compilation Toolset#

Model Deployment APIs#

Salient Features of Vitis AI#

Workflow and Components#