Release Notes#

Version 6.2#

This is the first public release of the AMD Vitis™ AI 6.2 User Guide for Versal AI Edge Series Gen2.

Key Features#

Core Platform Support#

Target Hardware: VEK385 RevB and RevA (pre-production silicon) evaluation boards with Versal AI Edge Series Gen2 Adaptive SoCs
Supported Devices: XC2VE3858, XC2VE3504, XC2VE3558, XC2VE3804, XC2VE3804_SE, and XC2VE3858_SE
Docker-Based Development: Pre-built Docker image with all necessary tools for model quantization and compilation on Linux host
Tool Versions: Vivado 2025.2, Vitis 2025.2

Model Quantization and Compilation#

ONNX Format Support: Full support for ONNX models with opset 11-20 (opset 20 recommended for optimal performance), partial support for opset 21-22
Supported Types: Support for FP32, BF16, FP16, INT8 models. FP32, FP16, BF16 are unquantized floating point types.
Quantization Workflows:
- INT8 Explicit Quantization: AMD Quark toolkit for INT8 quantization with calibration and optional fast fine-tuning
- BF16 Implicit Conversion: Automatic FP32-to-BF16 conversion during Vitis AI compilation without calibration
- FP16 Model Conversion: AMD Quark toolkit for converting models to FP16 format
Mixed Precision Compilation: Automatic BF16 and FP16 conversion of FP32 operations to eliminate CPU fallback and improve performance
Operator Support: Comprehensive support for 2D CNN and Vision Transformer models
Data Parallelism: Support for batching inputs multiple, with each input feeding into an independent copy of the model pipeline for efficient inference
Tensor Parallelism: Ability to partition models across multiple NPU columns for improved throughput and parallelization of large models
Dynamic Batch Support: Compile models once and run with different batch sizes at inference time

Deployment Runtimes#

ONNX Runtime with AMD Vitis™ AI Execution Provider: Streamlined, framework-agnostic deployment with automatic CPU/NPU partitioning for heterogeneous execution
VART-ML Runtime: High-performance runtime optimized for fully NPU-offloaded models with zero-copy execution support. Runtime is C++ based and does not include Python APIs. Enhancements include mixed tensor support, batch processing, asynchronous inference APIs, and AIE column control with spatial and temporal sharing.
VART-X APIs: Specialized APIs for video analytics with hardware-accelerated preprocessing (resize, color conversion, normalization) and integrated postprocessing/overlay functions
VART-X and VART-ML API Documentation: VART-X and VART-ML API guidance is documented in the open-source Vitis AI documentation portal at VART-ML API
CPU Partition Compilation: Support for CPU partition compilation enables heterogeneous execution of ONNX models across NPU and CPU hardware using Vitis AI compiler-supported CPU operators with the VART-ML API
Spatial and Temporal Execution: Ability to run models spatially across multiple NPU columns and temporally by pipelining execution across time for optimized resource use
Multiple API Support: Python and C++ APIs for ONNX Runtime and C++ APIs for VART-ML based Runtime

Development and Analysis Tools#

AI Analyzer: Comprehensive tool for model compilation visualization and inference profiling with three key sections:
- Partitioning Analysis: Visual breakdown of CPU/NPU operator assignments and GOP offloading statistics
- NPU Insights: Detailed view of NPU optimization including operator fusion and memory partitioning
- Performance Profiling: Inference execution analysis with latency and throughput metrics
- DDR Throughput Profiling: Measure and analyze DDR memory throughput between the NoC (Network-on-Chip) and the AI Engine array, and visualize the results in AI Analyzer to identify memory access bottlenecks affecting model performance.

Integrated System Reference Design#

End-to-End Reference Design: Complete source code with hardware-accelerated preprocessing via Image Processing PL kernel, NPU inference, and CPU-based postprocessing
Hardware Preprocessing: Image Processing PL HLS kernel supporting resize, color space conversion, normalization, and cropping, with additional FP16 output formats
NPU Execution: Full model offload to Neural Processing Unit (NPU) for supported operators
Scalable CPU Post-processing: Scalable post-processing CPU functions for ONNX and VART runtimes, covering image classification, object detection, and segmentation on CPU
Multi-Model Support: Concurrent execution of different compiled models on spatially partitioned NPU resources, or sequential execution on a shared partition using temporal (time-multiplexed) scheduling
Zero-Copy Inference: Enabled using device-backed tensor buffers (for example, XRT buffer objects, DMA-BUF file descriptors, or CMA-backed pointers)

Boot and Deployment Options#

Multiple Boot Flows: Support for OSPI and SD Card boot flows. Alternatively OSPI AND Universal Flash Storage Boot (UFS) boot flow also available.
Pre-built Boot Images: Ready-to-test images for both RevA and RevB boards
Board Boot Scripts: Helper scripts provided to simplify and automate the board booting process
Compiled Model Formats:
- Directory structure format for flexible deployment
- Flat-buffer format (.rai files) for memory-mapped efficient inference
Cross-Compilation SDK: Complete sysroot environment for building target applications on host machine

Example Applications and Models#

Quick Start ResNet50 Demo: Ready-to-run ResNet50 example for immediate evaluation on the target board
End-to-End Tutorials: Comprehensive tutorials exercising the complete flow from model quantization through compilation to inference execution on the board
Pre-built C++ Applications: Pre-built C++ applications using ONNX and VART-ML Runtime for functional and performance evaluation
ML Acceleration Examples: Multiple C++ examples demonstrating ML acceleration using ONNX and VART runtime APIs
Pre-built Models: Collection of example ONNX models for quick tool evaluation ( ResNet-50, and YOLOx)
Open-Source Availability: Gen2 examples, tutorials, and reference design are open source in the Vitis AI GitHub repo under versal_2ve/

Limitations#

Model and Operator Constraints#

ONNX Opset: Operators introduced after ONNX opset 22 are not supported
Operator Constraints: Certain ONNX operators are not supported and cause models to fall back to CPU execution; refer to the ONNX Operators section in the user guide for detailed operator compatibility

Quantization Constraints#

FP16-to-INT8 quantization: The version of Quark bundled with the Vitis AI 6.2 Docker image does not support the FP16-to-INT8 quantization workflow. Do not attempt this workflow with the bundled version. Support for this workflow is planned for the upcoming Quark 0.12 release.

Model Compilation Constraints#

Large ONNX Models: When a model exceeds 2 GB, it is stored using ONNX external data format (model.onnx + model.onnx.data). The full file path to model.onnx must be specified at runtime to ensure the companion .data file is correctly resolved and loaded.
Write Permissions: The cache directory must have write permissions enabled during compilation. This allows the compiler to store generated artifacts necessary for the build process.
Docker Stack Size Configuration: When launching Docker containers, use the –ulimit stack=-1:-1 option to allocate unlimited stack memory. This configuration is essential for compiling large models.

Known System Issues#

Permission Requirements: Must run sudo -i on target board to avoid permission issues when creating hardware context
AI Analyzer DDR Throughput Profiling Analysis Not Displayed in GUI: Enhanced profiling JSON files are generated successfully but do not appear in the AI Analyzer GUI.

Workaround: Copy record_timer*json and onnxruntime_profile_*json from analyzed_data/mlprofiler_ddr_merge/ to analyzed_data/ and relaunch AI Analyzer.

Release Notes

Contents

Release Notes#

Version 6.2#

Key Features#

Core Platform Support#

Model Quantization and Compilation#

Deployment Runtimes#

Development and Analysis Tools#

Integrated System Reference Design#

Boot and Deployment Options#

Example Applications and Models#

Limitations#

Model and Operator Constraints#

Quantization Constraints#

Model Compilation Constraints#

Known System Issues#