Release Notes#

Version 6.2#

This is the first public release of the AMD Vitis™ AI 6.2 User Guide for Versal AI Edge Series Gen2.

Key Features#

Core Platform Support#

  • Target Hardware: VEK385 RevB and RevA (pre-production silicon) evaluation boards with Versal AI Edge Series Gen2 Adaptive SoCs

  • Supported Devices: XC2VE3858, XC2VE3504, XC2VE3558, XC2VE3804, XC2VE3804_SE, and XC2VE3858_SE

  • Docker-Based Development: Pre-built Docker image with all necessary tools for model quantization and compilation on Linux host

  • Tool Versions: Vivado 2025.2, Vitis 2025.2

Model Quantization and Compilation#

  • ONNX Format Support: Full support for ONNX models with opset 11-20 (opset 20 recommended for optimal performance), partial support for opset 21-22

  • Supported Types: Support for FP32, BF16, FP16, INT8 models. FP32, FP16, BF16 are unquantized floating point types.

  • Quantization Workflows:

    • INT8 Explicit Quantization: AMD Quark toolkit for INT8 quantization with calibration and optional fast fine-tuning

    • BF16 Implicit Conversion: Automatic FP32-to-BF16 conversion during Vitis AI compilation without calibration

    • FP16 Model Conversion: AMD Quark toolkit for converting models to FP16 format

  • Mixed Precision Compilation: Automatic BF16 and FP16 conversion of FP32 operations to eliminate CPU fallback and improve performance

  • Operator Support: Comprehensive support for 2D CNN and Vision Transformer models

  • Data Parallelism: Support for batching inputs multiple, with each input feeding into an independent copy of the model pipeline for efficient inference

  • Tensor Parallelism: Ability to partition models across multiple NPU columns for improved throughput and parallelization of large models

  • Dynamic Batch Support: Compile models once and run with different batch sizes at inference time

Deployment Runtimes#

  • ONNX Runtime with AMD Vitis™ AI Execution Provider: Streamlined, framework-agnostic deployment with automatic CPU/NPU partitioning for heterogeneous execution

  • VART-ML Runtime: High-performance runtime optimized for fully NPU-offloaded models with zero-copy execution support. Runtime is C++ based and does not include Python APIs.

  • VART-X APIs: Specialized APIs for video analytics with hardware-accelerated preprocessing (resize, color conversion, normalization) and integrated postprocessing/overlay functions

  • Spatial and Temporal Execution: Ability to run models spatially across multiple NPU columns and temporally by pipelining execution across time for optimized resource use

  • Multiple API Support: Python and C++ APIs for ONNX Runtime and C++ APIs for VART-ML based Runtime

Development and Analysis Tools#

  • AI Analyzer: Comprehensive tool for model compilation visualization and inference profiling with three key sections:

    • Partitioning Analysis: Visual breakdown of CPU/NPU operator assignments and GOP offloading statistics

    • NPU Insights: Detailed view of NPU optimization including operator fusion and memory partitioning

    • Performance Profiling: Inference execution analysis with latency and throughput metrics

    • DDR Throughput Profiling: Measure and analyze DDR memory throughput between the NoC (Network-on-Chip) and the AI Engine array, and visualize the results in AI Analyzer to identify memory access bottlenecks affecting model performance.

Integrated System Reference Design#

  • End-to-End Reference Design: Complete source code with hardware-accelerated preprocessing via Image Processing PL kernel, NPU inference, and CPU-based postprocessing

  • Hardware Preprocessing: Image Processing PL HLS kernel supporting resize, color space conversion, normalization, and cropping

  • NPU Execution: Full model offload to Neural Processing Unit (NPU) for supported operators

  • Multi-Model Support: Concurrent execution of different compiled models on spatially partitioned NPU resources, or sequential execution on a shared partition using temporal (time-multiplexed) scheduling

  • Zero-Copy Inference: Enabled using device-backed tensor buffers (for example, XRT buffer objects, DMA-BUF file descriptors, or CMA-backed pointers)

Boot and Deployment Options#

  • Multiple Boot Flows: Support for OSPI and SD Card boot flows. Alternatively OSPI AND Universal Flash Storage Boot (UFS) boot flow also available.

  • Pre-built Boot Images: Ready-to-test images for both RevA and RevB boards

  • Board Boot Scripts: Helper scripts provided to simplify and automate the board booting process

  • Compiled Model Formats:

    • Directory structure format for flexible deployment

    • Flat-buffer format (.rai files) for memory-mapped efficient inference

  • Cross-Compilation SDK: Complete sysroot environment for building target applications on host machine

Example Applications and Models#

  • Quick Start ResNet50 Demo: Ready-to-run ResNet50 example for immediate evaluation on the target board

  • End-to-End Tutorials: Comprehensive tutorials exercising the complete flow from model quantization through compilation to inference execution on the board

  • Pre-built C++ Applications: Pre-built C++ applications using ONNX and VART-ML Runtime for functional and performance evaluation

  • Pre-built Models: Collection of example ONNX models for quick tool evaluation ( ResNet-50, and YOLOx)

Limitations#

Model and Operator Constraints#

  • ONNX Opset: Operators introduced after ONNX opset 22 are not supported

  • Operator Constraints: Certain ONNX operators are not supported and cause models to fall back to CPU execution; refer to the ONNX Operators section in the user guide for detailed operator compatibility

Quantization Constraints#

  • FP16-to-INT8 quantization: The version of Quark bundled with the Vitis AI 6.2 Docker image does not support the FP16-to-INT8 quantization workflow. Do not attempt this workflow with the bundled version. Support for this workflow is planned for the upcoming Quark 0.12 release.

Model Compilation Constraints#

  • Large ONNX Models: When a model exceeds 2 GB, it is stored using ONNX external data format (model.onnx + model.onnx.data). The full file path to model.onnx must be specified at runtime to ensure the companion .data file is correctly resolved and loaded.

  • Write Permissions: The cache directory must have write permissions enabled during compilation. This allows the compiler to store generated artifacts necessary for the build process.

  • Docker Stack Size Configuration: When launching Docker containers, use the –ulimit stack=-1:-1 option to allocate unlimited stack memory. This configuration is essential for compiling large models.

Known System Issues#

  • Permission Requirements: Must run sudo -i on target board to avoid permission issues when creating hardware context

  • AI Analyzer DDR Throughput Profiling Analysis Not Displayed in GUI: Enhanced profiling JSON files are generated successfully but do not appear in the AI Analyzer GUI.

    Workaround: Copy record_timer*json and onnxruntime_profile_*json from analyzed_data/mlprofiler_ddr_merge/ to analyzed_data/ and relaunch AI Analyzer.