Release Notes#

Current Release#

Version 5.1

Release Notes

  • Feature Updates

    • Utilizes Vitis, and Vivado with v2025.1.1 , XRT, and Petalinux with v2025.1.

    • Reference design supports integration of two NPU IPs (Multiple NPU instances) feature.

    • Support for another 38 columns NPU IP which can be used for a design with NPU connected on either 1 or 2 DDRs (used in interleaving mode).

    • Supports Mixed Precision feature

    • YOLO tails acceleration on AIE

    • New Layers Support

      • DepthtoSpace, PixelShuffle, Deconvolution, Up-conv, Transpose-conv (2D version)

        • Only square boxes are supported (n, n) with n multiple of 2

        • Input channels must be a multiple of 16 × n × n

    • Depthwise support upgradation on VE2802 to provide better performance

    • Applications renamed

      • ‘npu_runner_demo’ to ‘vart_ml_demo’

      • ‘npu_runner.py’ to ‘vart_ml_runner.py’

    • End-to-End (X+ML) application is updated to support parallel ML pipelines using multi-threading.

  • Known Issues/Limitation

    • Batch size is limited (only batchSize=1 has been validated) when tail is accelerated on AIE.

    • Some models may not work on a smaller IPs (for instance the one used by the multi-IP design) or in MIXED or BF16 precision due to the layers not fitting in the memory tiles.

    • End-to-End (X+ML) application supports INT8 snapshots (Yolo models using Mixed Precision do not support pre-processing in Programmable Logic).

    • Multiple processes of X+ML applications is not supported yet.

Note

In X + ML, X refers to the hardware-accelerated pre-processing task, while ML refers to the inference task running on the Neural Processing Unit (NPU).

Previous releases#

For documentation on the previous releases, visit the Vitis AI Lounge.

Version 2025.1

Release Notes

  • New Features

    • Utilizes Vitis, Vivado, Petalinux, and XRT Tools v2025.1

    • Supports VE2202 and VE2602 Performant IPs, along with VE2802 and VE2302

    • Removes support for functional IP and related content

    • Fixes bugs in corner cases | Supports all batch sizes / various input sizes | No new layer support

    • BF16 support for YOLOv5, YOLOv7, YOLOv8, YOLOX. See Supported Models for other CNNs compatible with BF16

    • Improved performance/efficiency for SSD-ResNet34, YOLOv5, and YOLOX

    • Supports NHWC native format for input and output

    • Introduces Apptainer as an alternative to Docker

      • Non-service container; no root required (better security/usability)

      • Addresses Docker-related issues; future releases may phase out Docker

    • Accelerates input pre-processing within AIE accelerated graphs

    • Supports acceleration of YOLO tail graph on AIE

    • Redesigned performance summary table for comprehensive details

Version 2024.2

Release Notes

  • New Features

    • Supports PyTorch v1.12, TensorFlow v2.9.0, ONNX Runtime v1.20.1

    • Enables zero-copy with C++ and Python APIs; supported in End-to-End (X+ML) app as well

    • NPU IP (VE2802 – performance): Three variants (38/24/16 columns, INT8) and one variant (38 columns, BF16)

    • Officially supports and verifies Aurora’s VoVNet and ResNet18 models

    • NPU IP (VE2802 – functional): One 7-column NPU IP

    • NPU IP (VE2302 – performance): Full-columns and half-columns variants

    • Updated NPU IP names

    • Utilizes Vitis, Vivado, Petalinux, and XRT Tools v2024.2

    • Tool for ONNX-to-PL acceleration for YOLO tail graphs; quick-start updated for YOLOX tail

    • Broader model coverage. See Supported Models