Introduction: VART X and VART ML#

Note

For the recommended chapter reading order (overview → this page → architecture → guides), start with docs/appendix/vart_overview.

This document introduces VART ML and VART X. It explains what each component does, why to use it, and when to choose each runtime option. For implementation guidance, see docs/appendix/vart_app_guide.

Vitis AI Runtime (VART) is the runtime software to execute and accelerate AI applications on AMD hardware. VART includes two key components: VART ML and VART X. Each component serves a distinct but complementary purpose.

VART ML#

VART ML is specially suited for inference. It uses AMD accelerators to execute machine learning models with high efficiency and low latency. VART ML uses the VART Runner to run compiled models on AMD hardware, including models that are fully offloaded to the NPU and models compiled with the CPU partition feature for heterogeneous NPU/CPU execution. The VART Runner provides a consistent interface and hides underlying complex details while optimizing hardware use.

VART X#

VART X focuses on tasks around inference. It provides APIs for preprocessing, postprocessing, and visualization. It handles video format conversion, video frame and memory management, and overlay rendering for inference results on video streams.

Hardware acceleration in VART X is module-specific (see docs/appendix/x-architecture-overview):

  • PreProcess — can use programmable logic (PL) or other integrated platform IP for resize, color-space conversion, and normalization when the design enables it; software fallbacks exist when hardware paths are not used.

  • VideoFrame and Memory — integrate with XRT for device-backed buffers and efficient data movement toward inference.

  • PL Kernel — runs custom or platform PL stages explicitly in the FPGA fabric.

  • PostProcess, MetaConvert, and Overlay — typically run on the CPU (for example OpenCV-based drawing), unless you plug in a custom implementation that uses another engine.

Together, VART ML and VART X provide a framework to deploy and manage AI workloads efficiently on AMD hardware and deliver high-performance AI solutions.

Why VART ML?#

The Vitis AI Compiler transforms machine learning models from frameworks such as ONNX into optimized formats for AMD edge hardware. This optimization enables efficient, low-latency execution on the AI Engine for real-time and performance-sensitive applications.

AMD provides the following runtime options to execute optimized models:

  • ONNX Runtime with the Vitis AI Execution Provider: Run models through ONNX Runtime with the Vitis AI Execution Provider. It allows models that are not fully offloaded to the accelerator to execute efficiently by distributing work between the CPU and the accelerator.

  • VART Runner (VART ML): The VART Runner delivers high-throughput execution for compiled models on AMD accelerators. It supports fully offloaded models (entire graph on the NPU) and, when compiled with the CPU partition feature, models that distribute work between the NPU and CPU using compiler-supported operators. It optimizes hardware utilization, reduces latency, and supports zero-copy inference to maximize application performance.

Choosing between the Vitis AI Execution Provider (ONNX Runtime) and the VART Runner (VART ML)#

Select a runtime based on model support and performance requirements. The two primary options are the Vitis AI Execution Provider with ONNX Runtime APIs and the VART Runner with VART ML APIs. The following table summarizes when to use each option:

Feature or Scenario

Vitis AI Execution Provider (ONNX Runtime)

VART Runner (VART ML)

Interface

ONNX Runtime API

VART ML API

Model Offloading

Supports partial offload: some layers run on the accelerator (Vitis AI Execution Provider) and other layers run on the CPU Execution Provider

Compile-time partitioning: supports fully offloaded models on the NPU and heterogeneous NPU/CPU execution for models compiled with the CPU partition feature.

Flexibility

High: broad ONNX operator coverage with runtime CPU fallback.

Best for production deployments with fully offloaded or CPU-partition compiled models.

Performance

Good, but might be limited by CPU-executed layers.

Maximizes hardware use, minimizes latency, and supports zero-copy inference.

Use Case

Use this option when the model contains operators outside compiler or CPU partition support, or when the workflow requires ONNX Runtime.

Use this option when the model is fully offloaded to the NPU or compiled with CPU partition for VART-ML, and the workflow requires maximum throughput and efficiency.

Zero-Copy Inference

Not guaranteed.

Supported.

Best For

Heterogeneous or partially supported models and ONNX ecosystem integration.

High-throughput, low-latency inference on AMD accelerators.

Deployment

Functional validation and initial bring-up.

Production deployment with optimal performance for fully offloaded or CPU-partition compiled models.

Use the Vitis AI Execution Provider (ONNX Runtime) for flexibility and ONNX compatibility, especially when the model requires runtime CPU fallback or operators outside compiler and CPU partition support. This option fits initial bring-up and functional validation.

Use the VART Runner (VART ML) for maximum performance when your model is fully offloaded to the NPU or compiled with the CPU partition feature for VART-ML. This option fits production deployment.

Why VART X?#

VART X targets video processing applications that require efficient video data handling and effective use of AMD hardware. The following sections summarize key reasons to use VART X.

Comprehensive Video Pipeline Management#

  • VART X manages complex video pipelines, including device management, video format conversion, and frame overlay tasks.

  • VART X provides C++ APIs for preprocessing, postprocessing, and result visualization.

Efficient Preprocessing and Postprocessing#

  • VART X supports hardware-accelerated preprocessing so input data reaches inference in the required format with low latency when PL is used.

  • Postprocessing modules include model-oriented presets (for example, ResNet50, YOLOv2, and SSD-ResNet34) plus generic postprocessing functions. VART X also supports custom postprocessing implementations. For the complete supported list, see VART X APIs.

Scalability and Flexibility#

  • VART X uses a modular design to integrate with existing applications. Modules ship as shared objects (.so files) to simplify updates and maintenance.

  • VART X supports customization of pipeline components to match different use cases.

Visualization#

  • VART X supports overlay of inference results on video streams by using libraries such as OpenCV.

Use VART X when the application requires video pipeline control alongside machine learning integration. VART X fits complex video pipelines, efficient preprocessing and postprocessing, and hardware-accelerated workflows. VART X also fits applications that require result visualization and control of data movement, latency, and throughput.

For details about VART ML architecture, see docs/appendix/ml-architecture-overview.

For details about VART X architecture, see docs/appendix/x-architecture-overview.