Release Notes#

Current Release#

Version 5.1

Release Notes
Feature Updates Utilizes Vitis, and Vivado with v2025.1.1 , XRT, and Petalinux with v2025.1. Reference design supports integration of two NPU IPs (Multiple NPU instances) feature. Supports another 38-column NPU IP that can be used for a design with NPU connected on either one or two DDRs (used in interleaving mode). Supports mixed precision implementation on YOLO models for better performance and accuracy trade-off. New Layers Support DepthtoSpace, PixelShuffle, Deconvolution, Up-conv, Transpose-conv (2D version) Only square boxes are supported (n, n) with n multiple of 2 Input channels must be a multiple of 16 × n × n Depthwise support upgradation on VE2802 to provide better performance Applications renamed ‘npu_runner_demo’ to ‘vart_ml_demo’ ‘npu_runner.py’ to ‘vart_ml_runner.py’ Updated the End-to-End (X+ML) application to support parallel ML pipelines using multi-threading. Known Issues/Limitation Batch size is limited when tail is accelerated on AIE; Validated only batchSize=1. Some models may not work on a smaller IPs (for instance the one used by the Multiple Instances of NPU design) or in MIXED or BF16 precision due to the layers not fitting in the memory tiles. The `vart_ml_demo` application prints incorrect prediction scores. End-to-End (X+ML) application supports INT8 snapshots (Yolo models using Mixed Precision do not support pre-processing in Programmable Logic). The X+ML application does not support multiple processes.

Release Notes

Feature Updates
- Utilizes Vitis, and Vivado with v2025.1.1 , XRT, and Petalinux with v2025.1.
- Reference design supports integration of two NPU IPs (Multiple NPU instances) feature.
- Supports another 38-column NPU IP that can be used for a design with NPU connected on either one or two DDRs (used in interleaving mode).
- Supports mixed precision implementation on YOLO models for better performance and accuracy trade-off.
- New Layers Support
  - DepthtoSpace, PixelShuffle, Deconvolution, Up-conv, Transpose-conv (2D version)
    - Only square boxes are supported (n, n) with n multiple of 2
    - Input channels must be a multiple of 16 × n × n
- Depthwise support upgradation on VE2802 to provide better performance
- Applications renamed
  - ‘npu_runner_demo’ to ‘vart_ml_demo’
  - ‘npu_runner.py’ to ‘vart_ml_runner.py’
- Updated the End-to-End (X+ML) application to support parallel ML pipelines using multi-threading.
Known Issues/Limitation
- Batch size is limited when tail is accelerated on AIE; Validated only batchSize=1.
- Some models may not work on a smaller IPs (for instance the one used by the Multiple Instances of NPU design) or in MIXED or BF16 precision due to the layers not fitting in the memory tiles.
- The vart_ml_demo application prints incorrect prediction scores.
- End-to-End (X+ML) application supports INT8 snapshots (Yolo models using Mixed Precision do not support pre-processing in Programmable Logic).
- The X+ML application does not support multiple processes.

Note

In X + ML, X refers to the hardware-accelerated pre-processing task and ML refers to the inference task running on the NPU.

Previous releases#

For documentation on the previous releases, visit the Vitis AI Lounge.

Version 2025.1

Release Notes
New Features Utilizes Vitis, Vivado, Petalinux, and XRT Tools v2025.1 Supports VE2202 and VE2602 Performant IPs, along with VE2802 and VE2302 Removes support for functional IP and related content Fixes bugs in corner cases \| Supports all batch sizes / various input sizes \| No new layer support BF16 support for YOLOv5, YOLOv7, YOLOv8, YOLOX. See Supported Models for other CNNs compatible with BF16 Improved performance/efficiency for SSD-ResNet34, YOLOv5, and YOLOX Supports NHWC native format for input and output Introduces Apptainer as an alternative to Docker Non-service container; no root required (better security/usability) Addresses Docker-related issues; future releases may phase out Docker Accelerates input pre-processing within AIE accelerated graphs Supports acceleration of YOLO tail graph on AIE Redesigned performance summary table for comprehensive details

Release Notes

New Features
- Utilizes Vitis, Vivado, Petalinux, and XRT Tools v2025.1
- Supports VE2202 and VE2602 Performant IPs, along with VE2802 and VE2302
- Removes support for functional IP and related content
- Fixes bugs in corner cases | Supports all batch sizes / various input sizes | No new layer support
- BF16 support for YOLOv5, YOLOv7, YOLOv8, YOLOX. See Supported Models for other CNNs compatible with BF16
- Improved performance/efficiency for SSD-ResNet34, YOLOv5, and YOLOX
- Supports NHWC native format for input and output
- Introduces Apptainer as an alternative to Docker
  - Non-service container; no root required (better security/usability)
  - Addresses Docker-related issues; future releases may phase out Docker
- Accelerates input pre-processing within AIE accelerated graphs
- Supports acceleration of YOLO tail graph on AIE
- Redesigned performance summary table for comprehensive details

Version 2024.2

Release Notes
New Features Supports PyTorch v1.12, TensorFlow v2.9.0, ONNX Runtime v1.20.1 Enables zero-copy with C++ and Python APIs; supported in End-to-End (X+ML) app as well NPU IP (VE2802 – performance): Three variants (38/24/16 columns, INT8) and one variant (38 columns, BF16) Officially supports and verifies Aurora’s VoVNet and ResNet18 models NPU IP (VE2802 – functional): One 7-column NPU IP NPU IP (VE2302 – performance): Full-columns and half-columns variants Updated NPU IP names Utilizes Vitis, Vivado, Petalinux, and XRT Tools v2024.2 Tool for ONNX-to-PL acceleration for YOLO tail graphs; quick-start updated for YOLOX tail Broader model coverage. See Supported Models

Release Notes

New Features
- Supports PyTorch v1.12, TensorFlow v2.9.0, ONNX Runtime v1.20.1
- Enables zero-copy with C++ and Python APIs; supported in End-to-End (X+ML) app as well
- NPU IP (VE2802 – performance): Three variants (38/24/16 columns, INT8) and one variant (38 columns, BF16)
- Officially supports and verifies Aurora’s VoVNet and ResNet18 models
- NPU IP (VE2802 – functional): One 7-column NPU IP
- NPU IP (VE2302 – performance): Full-columns and half-columns variants
- Updated NPU IP names
- Utilizes Vitis, Vivado, Petalinux, and XRT Tools v2024.2
- Tool for ONNX-to-PL acceleration for YOLO tail graphs; quick-start updated for YOLOX tail
- Broader model coverage. See Supported Models

Release Notes

Contents

Release Notes#

Current Release#

Previous releases#