Namespace vart#

namespace vart#

Enums

enum class DataType#

Enumerates the supported data types for tensors in the VART API.

This enum defines the various data types that can be used to represent tensor elements. It includes integer and floating-point formats, as well as specialized types such as BF16.

UNKNOWN: Unknown data type.
BOOLEAN: Boolean type.
INT8: 8-bit signed integer.
UINT8: 8-bit unsigned integer.
INT16: 16-bit signed integer.
UINT16: 16-bit unsigned integer.
BF16: 16-bit Brain Floating Point format.
FP16: 16-bit floating point.
INT32: 32-bit signed integer.
UINT32: 32-bit unsigned integer.
FLOAT32: 32-bit floating point.
INT64: 64-bit signed integer.

Values:

enumerator UNKNOWN#

enumerator BOOLEAN#

enumerator INT8#

enumerator UINT8#

enumerator INT16#

enumerator UINT16#

enumerator BF16#

enumerator FP16#

enumerator INT32#

enumerator UINT32#

enumerator FLOAT32#

enumerator INT64#

enum class MemoryLayout#

Enumerates the supported memory layouts for tensors in the VART API.

This enum defines the various memory layouts that can be used to represent tensor data. It includes formats such as NHWC, NCHW, and others that specify how tensor dimensions are organized in memory.

UNKNOWN: Unknown memory layout.
NC: Model batch, Channels (packed format).
NCH: Model batch, Channels (packed format), Height.
NHC: Model batch, Height, Channels (packed format).
NHW: Model batch, Height, Width.
NHWC: Model batch, Height, Width, Channels (packed format).
NCHW: Model batch, Channels, Height, Width (planar format).
NHWC4: Model batch, Height, Width, Channel groups of 4 (e.g RGBA).
NHWC8: Model batch, Height, Width, Channel groups of 8.
NC4HW4: Model batch, Channels / 4, Height, Width, Channel groups of 4.
NC8HW8: Model batch, Channels / 8, Height, Width, Channel groups of 8.
HCWNC4: Height, Channels / 4, Width, N = 1, Channel groups of 4.
HCWNC8: Height, Channels / 8, Width, N = 1, Channel groups of 8.
NHW16C4WC: Model batch, Height, Width / 16, Channels / 4, Width groups of 16, Channel groups of 4.
GENERIC: Generic layout. See NpuTensorInfo::memory_layout_order for more info.

Values:

enumerator UNKNOWN#

enumerator NC#

enumerator NCH#

enumerator NHC#

enumerator NHW#

enumerator NHWC#

enumerator NCHW#

enumerator NHWC4#

enumerator NHWC8#

enumerator NC4HW4#

enumerator NC8HW8#

enumerator HCWNC4#

enumerator HCWNC8#

enumerator NHW16C4WC#

enumerator GENERIC#

enum class MemoryType#

Enumerates the various memory types utilized for tensors in the VART API.

This enumeration specifies the locations where tensor data is stored:

UNKNOWN: Memory type is not specified or recognized.
XRT_BO: Represents a buffer object associated with the XRT (Xilinx Runtime).
DMA_FD: Corresponds to a file descriptor used for Direct Memory Access (DMA).
USER_POINTER_CMA: Indicates a user-provided virtual pointer that points to a contiguous physical block of memory.
USER_POINTER_NON_CMA: Indicates a user-provided virtual pointer that does not guarantee contiguous memory allocation. This memory can be allocated using standard methods such as new, malloc, or calloc.

Values:

enumerator UNKNOWN#

enumerator XRT_BO#

enumerator DMA_FD#

enumerator USER_POINTER_CMA#

enumerator USER_POINTER_NON_CMA#

enum class TensorDirection#

Enumerates the supported tensor directions in the VART API.

This enum defines the various directions that tensors can have in the context of model inference. It includes input and output directions.

INPUT: Input tensor direction.
OUTPUT: Output tensor direction.

Values:

enumerator INPUT#

enumerator OUTPUT#

enum class TensorType#

Specifies the tensor types supported in the VART API.

Enumerates the available tensor types:

CPU: Represents tensor metadata from the ONNX model, as defined for standard CPU execution.
HW: Corresponds to AMD hardware-specific tensor metadata, formatted for direct execution on AMD AI engines.

Note

AMD optimizes its AI engines with unique data formats and memory layouts. As a result, the HW tensor layout and format will typically differ from the CPU tensor representation defined by the ONNX model.

Values:

enumerator CPU#

enumerator HW#

enum class RunnerType#

Enumerates the types of runner implementations supported.

This enumeration specifies the different runner types available for model inference. RunnerType identifies the backend or solution used to execute the model.

VAIML: VAIML-based runner implementation.

Values:

enumerator VAIML#

enum class RoundingMode#

Enumerates the rounding modes used in quantization.

This enum defines the different rounding modes that can be applied during quantization, such as rounding to nearest even or truncating towards zero.

UNKNOWN: Unknown rounding mode.
ROUND_TO_NEAREST_EVEN: Round to nearest even value.
ROUND_TOWARD_ZERO: Truncate towards zero (no rounding).

Values:

enumerator UNKNOWN#

enumerator ROUND_TO_NEAREST_EVEN#

enumerator ROUND_TOWARD_ZERO#

enum class StatusCode#

Enumerates the status codes used in the VART.

This enum defines the various status codes that can be returned by VART functions, indicating the success or failure of an operation.

SUCCESS: Operation completed successfully.
FAILURE: Operation failed.
INVALID_INPUT: Invalid input parameters.
INVALID_OUTPUT: Invalid output parameters.
OUT_OF_MEMORY: Memory allocation failed.
RUNTIME_ERROR: Runtime error occurred.
JOB_PENDING: Job is still pending.
INVALID_JOB_ID: Provided job ID is invalid.
RESOURCE_UNAVAILABLE: Required resource is unavailable.

Values:

enumerator SUCCESS#

enumerator FAILURE#

enumerator INVALID_INPUT#

enumerator INVALID_OUTPUT#

enumerator OUT_OF_MEMORY#

enumerator RUNTIME_ERROR#

enumerator JOB_PENDING#

enumerator INVALID_JOB_ID#

enumerator RESOURCE_UNAVAILABLE#

struct NpuTensorInfo#

#include <vart_npu_tensor.hpp>

Metadata structure describing a tensor used in VART.

Contains various attributes used to define and manage a tensor:

Public Functions

inline NpuTensorInfo()#: Default constructor initializing members to default values.

void print() const#: Prints tensor metadata to standard output.

Public Members

std::string name#: Name of the tensor.

DataType data_type#: Data type of the tensor elements.

TensorDirection direction#: Direction of the tensor (input or output).

TensorType tensor_type#: Type of the tensor (CPU or HW).

MemoryLayout memory_layout#: Memory layout type of the tensor.

std::vector<uint32_t> memory_layout_order#: (Optional) Only relevant when memory_layout is GENERIC. Specifies the dimension permutation order for buffer data. This vector defines how dimensions are arranged compared to the reference TensorType::CPU tensor format. For example, if the TensorType::CPU format is “ABCD”, memory_layout_order is {0, 1, 2, 3}; if the TensorType::HW format is “ADBC”, memory_layout_order is {0, 3, 1, 2}.

size_t size#: Number of elements in the tensor.

size_t size_in_bytes#: Size of the tensor data in bytes.

std::vector<uint32_t> shape#: Dimensions of the tensor.

std::vector<uint32_t> strides#: Stride values for each dimension, specified in units of elements.

class NpuTensor

#include <vart_npu_tensor.hpp>

This class represents a tensor in the VART API.

This class encapsulates tensor metadata and offers access to the tensor’s data buffers. It acts as a lightweight wrapper around buffers supplied by the user.

Note

: This class does not take ownership of the buffer memory. The user is responsible for managing the buffer’s lifecycle.

Public Functions

NpuTensor(const NpuTensorInfo &info, void *buffer, const MemoryType &mem_type)

Construct a NpuTensor from a user-supplied buffer.

Initializes the tensor using the specified metadata and buffer.

Note

The NpuTensor does not take ownership of the buffer. The caller is responsible for ensuring the buffer remains valid for the lifetime of this object.

Parameters:

info – Tensor metadata (NpuTensorInfo).
buffer – Pointer to the user buffer containing the tensor data. The buffer must remain valid for the lifetime of the NpuTensor object.
mem_type – Specifies the memory type of the buffer.

NpuTensor(const NpuTensorInfo &info, const void *buffer, const MemoryType &mem_type)

Construct a NpuTensor from a user-supplied constant buffer.

Initializes the tensor using the specified metadata and constant buffer.

Note

The NpuTensor does not take ownership of the buffer. The caller is responsible for ensuring the buffer remains valid for the lifetime of this object.

Parameters:

info – Tensor metadata (NpuTensorInfo).
buffer – Pointer to the user constant buffer containing the tensor data.
mem_type – Specifies the memory type of the buffer.

~NpuTensor()

Destructor.

Properly destroys the NpuTensor and its implementation.

NpuTensor &operator=(NpuTensor &&other) noexcept

Move assignment operator.

Transfers ownership of the tensor metadata and buffer pointer from another NpuTensor.

Parameters:: other – The NpuTensor to move from.
Returns:: Reference to this NpuTensor.

NpuTensor(NpuTensor &&other) noexcept

Move constructor.

Transfers ownership of the tensor metadata and buffer pointer from another NpuTensor.

Parameters:: other – The NpuTensor instance to move from.

void *get_buffer()

Retrieves a pointer to the tensor’s buffer.

This function provides access to the buffer that was provided during tensor construction.

Returns:: const void* Pointer to the buffer, or nullptr if no buffer is available. If the memory type is MemoryType::XRT_BO, it returns a pointer to the corresponding xrt::bo. If the memory type is MemoryType::DMA_FD, it returns a pointer to the file descriptor. For MemoryType::USER_POINTER_CMA and MemoryType::USER_POINTER_NON_CMA, it returns the virtual pointer.

const void *get_buffer() const

Retrieves a pointer to the tensor’s buffer.

This function is the overloaded version for immutable (const) access

Returns:: const void* Pointer to the buffer, or nullptr if no buffer is available.

void *get_virtual_address()

Returns the virtual address of the tensor buffer.

Note

Virtual address retrieval is not supported for MemoryType::DMA_FD

Returns:: void* Pointer to the virtual address of the buffer, or nullptr if no buffer is available.

const void *get_virtual_address() const

Returns the virtual address of the tensor buffer.

This function is the overloaded version for immutable (const) access.

Note

Virtual address retrieval is not supported for MemoryType::DMA_FD

Returns:: const void* Pointer to the virtual address of the buffer, or nullptr if no buffer is available.

uint64_t get_physical_address() const

Returns the physical address of the tensor buffer.

Note

Physical address retrieval is only supported for MemoryType::XRT_BO

Returns:: uint64_t Physical address of the buffer if applicable, 0 otherwise.

const NpuTensorInfo &get_info() const

Returns the NpuTensorInfo metadata of the tensor.

This method returns the NpuTensorInfo object that contains metadata about the tensor, such as its name, shape, strides, data type, and memory layout.

Returns:: A constant reference to the NpuTensorInfo object.

MemoryType get_memory_type() const

Get the memory type of the tensor.

Returns:: MemoryType The memory type of the tensor.

void sync_buffer() const

Synchronizes the tensor buffer between CPU and AIE.

Ensures data consistency between CPU and AIE by performing cache operations based on the tensor’s direction:

For TensorDirection::INPUT, flushes cache to DDR for reading by AIE.
For TensorDirection::OUTPUT, invalidates cache for reading by CPU.

Note

Supported only for NpuTensors allocated using vart::Runner::allocate_npu_tensor.

Returns:: void

void print_info() const

Prints the metadata of the tensor.

This method prints the NpuTensorInfo metadata, including name, shape, strides, data type, memory layout, and size. It is useful for debugging and understanding the tensor’s properties.

struct QuantParameters#

#include <vart_runner_factory.hpp>

Struct representing quantization parameters for a tensor.

This struct holds the quantization parameters such as scale factor, zero point, and rounding mode used for quantizing tensors in the VART.

Public Members

double scale#: Scale factor for quantization.

int32_t zero_point#: Optional zero point for asymmetric quantization. Optional.

RoundingMode rounding_mode#: Rounding mode used during quantization. Optional.

struct JobHandle#

#include <vart_runner_factory.hpp>

Struct representing a job handle for asynchronous execution.

This struct holds the status of a submitted job, including whether it was successfully submitted and its unique identifier.

Public Members

uint32_t job_id#: Unique identifier for the job.

class Runner

#include <vart_runner_factory.hpp>

Abstract base class for executing model inference operations.

The Runner class defines a unified interface for running synchronous and asynchronous inference tasks on machine learning models. It provides methods for retrieving tensor metadata, executing computations, and managing asynchronous job execution.

Key Features:

Query input and output tensor information, including support for zero-copy operations.
Perform synchronous inference with input and output tensors.
Submit asynchronous inference jobs and manage their lifecycle via job handles or callbacks.
Support for both polling/waiting and callback-based asynchronous execution models.

Public Types

using ExecuteAsyncCallback = std::function<void(const JobHandle&, void*)>

Type alias for the callback function used in asynchronous execution operations.

This callback function is invoked when an asynchronous operation completes. The callback receives the job handle containing the completion status and a user-provided data pointer.

Note

The callback may be invoked from an internal worker thread, so users must ensure thread safety when accessing shared resources.

Param job_handle:: The handle of the completed asynchronous job, containing the final status and job identifier.
Param user_data:: A pointer to user-defined data that was provided when initiating the asynchronous operation.

Public Functions

virtual ~Runner() = default: Destroys the Runner object.

virtual const std::string &get_model_name(void) const = 0

Return the model name.

Returns:: Model name.

virtual const std::vector<NpuTensorInfo> &get_tensors_info(const TensorDirection &direction, const TensorType &type) const = 0

Unified API to retrieve tensor information based on direction and tensor type (CPU/HW).

This method retrieves tensor information based on the specified direction (input/output) and tensor type (CPU/HW).

Parameters:

direction – Specifies whether to retrieve input or output tensor information.
type – Specifies whether to retrieve CPU or HW tensor information.

Returns:

A constant reference to a vector containing NpuTensorInfo objects, each describing a tensor matching the specified criteria.

virtual const NpuTensorInfo &get_tensor_info_by_name(const std::string &tensor_name, const TensorType &type) const = 0

Unified API to retrieve tensor information by name and tensor type (CPU/HW).

This method retrieves tensor information for a specific tensor identified by name, with the ability to specify whether to retrieve CPU or HW tensor information.

Parameters:

tensor_name – The name of the tensor for which to retrieve information.
type – Specifies whether to retrieve CPU or HW tensor information.

Returns:

A constant reference to the NpuTensorInfo object describing the specified tensor.

virtual const QuantParameters &get_quant_parameters(const std::string &tensor_name) const = 0

Retrieves the quantization parameters for a specific tensor.

This method retrieves the quantization parameters for a tensor identified by its name.

Parameters:: tensor_name – The name of the tensor for which to retrieve quantization parameters.
Returns:: A QuantParameters object containing the scale factor and optional zero point.

virtual size_t get_num_input_tensors() const = 0

Returns the number of input tensors.

This method retrieves the number of input tensors required by the model or operation.

Returns:: The number of input tensors.

virtual size_t get_num_output_tensors() const = 0

Returns the number of output tensors.

This method retrieves the number of output tensors produced by the model or operation.

Returns:: The number of output tensors.

virtual size_t get_batch_size() const = 0

Returns the device batch size.

This method retrieves the device batch size for the model or operation.

Returns:: The device batch size.

virtual StatusCode execute(const std::vector<std::vector<NpuTensor>> &inputs, std::vector<std::vector<NpuTensor>> &outputs) = 0

Executes the main computation using the provided input tensors and produces output tensors.

This method is responsible for performing the actual inference or computation using the specified input tensors and generating the corresponding output tensors.

Note

Users should provide tensors in the same order as returned by get_tensors_info().

Parameters:

inputs – A constant reference to a vector of input NpuTensor objects, vector dimensions: [batch][tensors].
outputs – A reference to a vector of NpuTensor objects where the outputs will be stored, vector dimensions: [batch][tensors].

Returns:

A StatusCode indicating the success, failure, etc of the execution.

virtual JobHandle execute_async(const std::vector<std::vector<NpuTensor>> &inputs, std::vector<std::vector<NpuTensor>> &outputs) = 0

Executes the job asynchronously with the given input tensors.

This method initiates an asynchronous operation using the provided input tensors, and stores the results in the output tensors. The function returns a handle to the asynchronous job, allowing the caller to track or manage its execution.

Note

Users should provide tensors in the same order as returned by get_tensors_info(). inputs and outputs must remain valid until the job is completed.

Parameters:

inputs – A constant reference to a vector of input tensors required for the job, vector dimensions: [batch][tensors].
outputs – A reference to a vector where the output tensors will be stored upon completion, vector dimensions: [batch][tensors].

Returns:

JobHandle A handle representing the asynchronous job.

virtual StatusCode wait(const JobHandle &job_handle, unsigned int timeout) = 0

Waits for the completion of an asynchronous job.

This method is used to check the status of job submitted using execute_async and blocks until the specified job is completed or the timeout expires.

Parameters:

job_handle – A constant reference to the handle of the job to wait for.
timeout – The maximum time to wait in milliseconds. Zero timeout means the wait should check the job completion status and return immediately. If a positive timeout is specified, the wait will return once the task is completed, or the specified time has elapsed.

Returns:

StatusCode The status of the wait operation.

virtual JobHandle execute_async(const std::vector<std::vector<NpuTensor>> &inputs, std::vector<std::vector<NpuTensor>> &outputs, ExecuteAsyncCallback cb, void *user_data) = 0

Executes the operation asynchronously with the given input tensors.

This method starts the asynchronous execution of the operation using the provided input tensors. The results will be stored in the output tensors, and the specified callback will be invoked upon completion.

Note

The callback may be invoked from an internal worker thread, not necessarily the calling thread. Users are responsible for ensuring thread safety when accessing shared resources in the callback.

Note

Users should provide tensors in the same order as returned by get_tensors_info(). inputs and outputs must be valid until the callback is invoked.

Parameters:

inputs – A vector of input tensors to be processed, vector dimensions: [batch][tensors].
outputs – A reference to a vector where the output tensors will be stored, vector dimensions: [batch][tensors].
cb – A callback function to be called when the asynchronous execution is complete.
user_data – A pointer to user-defined data that will be passed to the callback function.

Returns:

JobHandle JobHandle for the submitted asynchronous job.

virtual NpuTensor allocate_npu_tensor(const NpuTensorInfo &info, StatusCode &status) const = 0

Allocates memory for an NPU tensor.

This API allocates contiguous tensor memory specifically for MemoryType::XRT_BO (i.e., an XRT Buffer Object), but only when the TensorType is set to HW. The caller is responsible for releasing the allocated memory using deallocate_npu_tensor when the tensor is no longer needed.

Parameters:

info – The metadata associated with the tensor.
status – A reference to a StatusCode variable that will capture the outcome of the allocation operation.

Returns:

NpuTensor The allocated NPU tensor. An empty tensor is returned if the allocation fails by setting the status accordingly.

virtual NpuTensor allocate_npu_tensor(const NpuTensorInfo &info, StatusCode &status, MemoryType mem_type, size_t ddr_idx = 0) const = 0

Allocates memory for an NPU tensor.

Parameters:

info – The metadata associated with the tensor.
status – A reference to a StatusCode variable that will capture the outcome of the allocation operation.
mem_type – Specifies the memory type of the buffer.
ddr_idx – Select the DDR in which the buffer will be allocated.

Returns:

NpuTensor The allocated NPU tensor. An empty tensor is returned if the allocation fails by setting the status accordingly.

virtual StatusCode deallocate_npu_tensor(NpuTensor &tensor) const = 0

Deallocates an NPU tensor previously allocated by allocate_npu_tensor.

Parameters:: tensor – Reference to the tensor to be deallocated.
Returns:: StatusCode indicating success or failure.

virtual uint8_t get_nb_ddrs(void) const = 0

Return the number of DDR used bu the NPU.

Returns:: Number of DDR.

class RunnerFactory

#include <vart_runner_factory.hpp>

Factory class for creating Runner instances.

Provides a static method to instantiate Runner objects based on the specified runner type, model path, and optional configuration options.

Public Static Functions

static std::shared_ptr<Runner> create_runner(RunnerType runner_type, const std::string &model_path, const std::unordered_map<std::string, std::any> &options = {})

Creates and returns a shared pointer to a Runner instance.

This static method initializes a Runner object for the specified runner type, using the provided model path and optional configuration options.

Parameters:

runner_type – The type of runner to create (e.g., VAIML).
model_path – The file system path to the model to be loaded by the Runner.
options – An optional map of additional configuration options, where each option is identified by a string key and can hold a value of any type.

Returns:

std::shared_ptr<Runner> A shared pointer to the created Runner instance.