Vitis AI 6.1 - Release#
Note
Vitis AI 6.1 is a General Access (GA) release for Versal™ AI Edge Series devices. The release supports the Neural Processing Unit (NPU) and replaces the deprecated Deep Learning Processing Unit (DPU) architecture.
For support, please contact your AMD sales representative or post your question to the Vitis AI and AI Community Forums.
Vitis AI 6.1 Developer Guide#
AMD Vitis™ AI is an IDE (Integrated Development Environment) that you can use to accelerate AI inference on AMD Adaptive SoCs and FPGAs. The IDE provides optimized IP (Intellectual Property), supporting tools, libraries, models, reference designs, and tutorials that help you throughout the development process. It is designed to provide high efficiency and ease of use and enable AI acceleration.
Vitis AI Integrated Development Environment Block Diagram
Key Components of Vitis AI#
The Vitis AI solution consists of three primary components:
Neural Processing Unit (NPU) IP: A purpose-built AI inference IP that uses a combination of Programmable Logic and the AI Engine Array to accelerate deployment of neural networks.
Model Compilation Tools: A set of tools to quantize, compile, and optimize machine learning (ML) models for NPU IP.
Model Deployment APIs: A collection of setup scripts, examples, and reference designs to integrate and execute ML inference models from a software application.
NPU IP#
AMD uses the acronym NPU IP to identify the “soft” accelerators that support deep-learning inference. The NPU IP uses a combination of AI Engines (AIE) and Programmable Logic (PL) to implement the inference accelerator.
Vitis AI provides NPU IP and supporting tools to deploy both standard and custom neural networks on AMD adaptive targets.
The Vitis AI NPU IP operates as a general-purpose AI inference accelerator. Multiple neural network (NN) models can load and run concurrently on a single NPU. You can instantiate multiple NPU IP instances per device and scale the NPU IP size to meet application requirements.
The Vitis AI NPU IP architecture is called a “Matrix of (Heterogeneous) Processing Engines.” Although the architecture resembles a systolic array at first glance, the similarity is only visual. The NPU IP operates as a micro-coded processor with its own instruction set architecture. Each NPU IP architecture uses its own instruction set.
The Vitis AI Compiler works with the NPU IP software stack to generate snapshots for each network deployment. The snapshot contains a quantized model and execution instructions for the NPU IP on the target platform.
Note
One advantage of this architecture is that you do not need to load a new bitstream or build a new hardware platform to change the neural network. This approach differentiates the NPU IP from dataflow accelerator architectures that target a single network.
Model Compilation Toolset#
Vitis AI Quantizer The Vitis AI Quantizer integrates with TensorFlow or PyTorch and converts 32-bit floating-point weights and activations to narrower data types such as INT8. This conversion reduces compute complexity with minimal loss of accuracy (about 1%). Running the fixed-point model uses less memory bandwidth and provides higher throughput and better power efficiency than the 32-bit floating-point model.
Vitis AI Compiler The Vitis AI Compiler maps the quantized model to an efficient instruction set and dataflow model. The compiler performs multiple optimizations. For example, it fuses batch normalization operations with convolution when the convolution operator precedes the batch normalization operator. The NPU IP supports multiple dimensions of parallelism, and efficient instruction scheduling uses this parallelism and improves data reuse in the graph.
Model Deployment APIs#
Vitis AI Runtime (VART) provides API functions to integrate the NPU IP into software applications. VART builds on the legacy Xilinx Runtime (XRT) and provides a unified high-level runtime for embedded targets. Key features include:
Asynchronous submission of jobs to the NPU IP
Asynchronous collection of jobs from the NPU IP
C++ and Python API implementations
Support for multithreaded execution
Salient Features of Vitis AI#
AIE and PL programmability
Low-latency, real-time AI inference
Low power consumption
Deep learning frameworks: PyTorch, TensorFlow
Broad CNN model coverage
Data types: INT8, BF16
C++ and Python APIs for easier integration
This section provides an overview of how developers deploy models on AMD embedded platforms.
The figure outlines the process for deploying a machine learning model with Vitis AI across different hardware environments based on embedded execution. The process starts with setting up the IT environment with the required AMD hardware and software. After training the ML model, the next step verifies performance on CPU or GPU platforms by running inference on an x86 host and checking accuracy. If accuracy does not meet requirements, the workflow tunes or retrains the model with Vitis AI tools. After validation, the workflow proceeds to embedded execution for integrated systems.
Embedded Execution Workflow#
The embedded execution process includes three steps:
Model compilation
Design
Embedded execution
Model Compilation#
Model compilation starts the workflow. In this step, the trained model compiles on an x86 host machine by using the NPU compiler software. This step also checks model accuracy. If post-quantization accuracy does not meet requirements, the workflow tunes the model with software APIs. After validation, the compiler generates a snapshot file and CPU subgraphs. By default, the system does not accelerate these subgraphs on the FPGA. The snapshot file packages the compiled model and runtime instructions into a single file.
Design#
This step integrates the NPU IP into a full-chip FPGA design with other IPs or kernels, such as preprocessing and postprocessing. The flow uses AMD Vitis and Vivado to compile the design and generate binaries for flashing to an SD card.
The design step supports two options:
Vitis IDE
Vivado IDE
Design – Vitis IDE Flow#
You can build a full-chip FPGA design by using Vitis IDE and integrate Vitis Sub-Systems (VSS) and Vitis kernels for custom IPs. When you use RTL IP, you can kernelize the IP in Vitis and integrate it into the design. After integration, the flow compiles, links, packages, and generates binaries in Vitis IDE.
Design – Vivado IDE Flow#
In the Vivado IDE flow, Vitis integrates VSS and Vitis kernel IPs and links them together. The flow exports the linking output to Vivado. You can then integrate custom RTL IPs, build the complete design, and generate binaries by using Vivado IDE.
Note
VSS (Vitis Sub-System) combines AI Engine configuration and kernelized PL logic for ML inference.
Model preparation and design steps remain independent, and you can work on both steps in parallel. Complete both steps before you proceed to embedded execution.
Embedded Execution#
Embedded execution completes the workflow. This step prepares the board, copies input videos or images, snapshots, and subgraphs to the SD card, and runs the application by using Vitis AI runtime APIs to execute the snapshot and generate inference results on the target.