Abstract base class for executing model inference operations. More...

#include <vart_runner_factory.hpp>

Public Types
using	ExecuteAsyncCallback = std::function< void(const JobHandle &)>
	Type alias for the callback function used in asynchronous execution operations. More...

Public Member Functions
virtual	~Runner ()=default
	Destroys the Runner object. More...

virtual const std::vector< NpuTensorInfo > &	get_tensors_info (TensorDirection direction, TensorType type) const =0
	Unified API to retrieve tensor information based on direction and tensor type (CPU/HW). More...

virtual const NpuTensorInfo &	get_tensor_info_by_name (const std::string &tensor_name, TensorType type) const =0
	Unified API to retrieve tensor information by name and tensor type (CPU/HW). More...

virtual const QuantParameters &	get_quant_parameters (const std::string &tensor_name) const =0
	Retrieves the quantization parameters for a specific tensor. More...

virtual size_t	get_num_input_tensors () const =0
	Returns the number of input tensors. More...

virtual size_t	get_num_output_tensors () const =0
	Returns the number of output tensors. More...

virtual size_t	get_batch_size () const =0
	Returns the device batch size. More...

virtual StatusCode	execute (const std::vector< std::vector< NpuTensor >> &inputs, std::vector< std::vector< NpuTensor >> &outputs) noexcept=0
	Executes the main computation using the provided input tensors and produces output tensors. More...

virtual JobHandle	execute_async (const std::vector< std::vector< NpuTensor >> &inputs, std::vector< std::vector< NpuTensor >> &outputs) noexcept=0
	Executes the job asynchronously with the given input tensors. More...

virtual StatusCode	wait (const JobHandle &job_handle, std::chrono::milliseconds timeout) noexcept=0
	Waits for the completion of an asynchronous job. More...

virtual JobHandle	execute_async (const std::vector< std::vector< NpuTensor >> &inputs, std::vector< std::vector< NpuTensor >> &outputs, ExecuteAsyncCallback cb) noexcept=0
	Executes the operation asynchronously with the given input tensors. More...

virtual NpuTensor	allocate_npu_tensor (const NpuTensorInfo &info) const =0
	Allocates memory for an NPU tensor. More...

virtual NpuTensor	allocate_sub_tensor (const NpuTensor &parent, const NpuTensorInfo &info, size_t offset) const =0
	Creates a sub-tensor from a parent tensor with the specified metadata and offset. More...

Protected Member Functions
	Runner (const std::string &model_path, const std::unordered_map< std::string, std::any > &options={})
	Constructs a Runner object with the specified model path and options. More...

Detailed Description

Abstract base class for executing model inference operations.

The Runner class defines a unified interface for running synchronous and asynchronous inference tasks on machine learning models. It provides methods for retrieving tensor metadata, executing computations, and managing asynchronous job execution.

Key Features:

Query input and output tensor information, including support for zero-copy operations.
Perform synchronous inference with input and output tensors.
Submit asynchronous inference jobs and manage their lifecycle via job handles or callbacks.
Support for both polling/waiting and callback-based asynchronous execution models.

Runner instances are thread-safe and designed to be shared across threads via std::shared_ptr<Runner>. Use RunnerFactory::create_runner() to obtain an instance.

Member Typedef Documentation

◆ ExecuteAsyncCallback

vart::Runner::ExecuteAsyncCallback

Type alias for the callback function used in asynchronous execution operations.

This callback function is invoked when an asynchronous operation completes. The callback receives a const reference to the JobHandle containing the completion status and job identifier.

Note: The callback may be invoked from an internal worker thread, so users must ensure thread safety when accessing shared resources.

Constructor & Destructor Documentation

◆ Runner()

vart::Runner::Runner	(	const std::string &	model_path,
		const std::unordered_map< std::string, std::any > &	options = `{}`
	)

inlineexplicitprotected

Constructs a Runner object with the specified model path and options.

Parameters

model_path	The file path to the model to be used by the Runner.
options	Optional configuration parameters for the Runner, provided as a map of string keys to values of any type.

Exceptions

std::runtime_error if Runner initialization fails.

◆ ~Runner()

virtual vart::Runner::~Runner ( )

virtualdefault

Destroys the Runner object.

Member Function Documentation

◆ allocate_npu_tensor()

virtual NpuTensor vart::Runner::allocate_npu_tensor ( const NpuTensorInfo & info ) const

pure virtual

Allocates memory for an NPU tensor.

This API allocates contiguous tensor memory backed by an XRT Buffer Object (MemoryType::XRT_BO). The user should copy input data into the allocated buffer before passing it to execute/execute_async.

Parameters

info	The metadata associated with the tensor.

Returns: NpuTensor The allocated NPU tensor.

Exceptions

std::runtime_error	if tensor allocation fails.
std::invalid_argument	if the provided NpuTensorInfo is invalid.

See also: allocate_sub_tensor, NpuTensor::sync_buffer

◆ allocate_sub_tensor()

virtual NpuTensor vart::Runner::allocate_sub_tensor	(	const NpuTensor &	parent,
		const NpuTensorInfo &	info,
		size_t	offset
	)		const

pure virtual

Creates a sub-tensor from a parent tensor with the specified metadata and offset.

The returned sub-tensor is a fully usable NpuTensor and can be passed to execute/execute_async like any other NpuTensor.

Parameters

parent	The parent tensor from which the sub-tensor will be created.
info	The metadata for the sub-tensor.
offset	The offset in bytes from the start of the parent tensor's buffer.

Returns: NpuTensor The created sub-tensor.

Exceptions

std::runtime_error	if sub-tensor creation fails.
std::invalid_argument	if the provided arguments are invalid.

Note

Requirements:

Sub-tensors can only be created from parent tensors allocated via allocate_npu_tensor.
The parent tensor must be large enough to contain the sub-tensor at the specified offset.
The offset and size in info must not exceed the parent tensor's buffer bounds.
Sub-tensors cannot be created from other sub-tensors (only one level of nesting is supported).

Memory Management:

A sub-tensor is created from a parent tensor as a view at a specified offset.
Creating a sub-tensor does not allocate new memory; it reuses a portion of the parent tensor's memory.
Parent memory/buffer is released only after the parent tensor and all derived sub-tensors are destroyed.
Destroying only the parent tensor does not release the underlying memory while any sub-tensor is still alive.
See also
allocate_npu_tensor

◆ execute()

virtual StatusCode vart::Runner::execute	(	const std::vector< std::vector< NpuTensor >> &	inputs,
		std::vector< std::vector< NpuTensor >> &	outputs
	)

pure virtualnoexcept

Executes the main computation using the provided input tensors and produces output tensors.

This method is responsible for performing the actual inference or computation using the specified input tensors and generating the corresponding output tensors.

Parameters

inputs	A constant reference to a vector of input NpuTensor objects, vector dimensions: [batch][tensors]. The outer vector size must be between 1 and get_batch_size() (inclusive). Each inner vector must contain one NpuTensor per model input.
outputs	A reference to a vector of NpuTensor objects where the outputs will be stored, vector dimensions: [batch][tensors]. The outer vector size must be between 1 and get_batch_size() (inclusive). Each inner vector must contain one NpuTensor per model output.

Returns: A StatusCode indicating the success or failure of the execution.

Note: Users should provide tensors in the same order as returned by get_tensors_info().

See also: execute_async, wait

◆ execute_async() [1/2]

virtual JobHandle vart::Runner::execute_async	(	const std::vector< std::vector< NpuTensor >> &	inputs,
		std::vector< std::vector< NpuTensor >> &	outputs
	)

pure virtualnoexcept

Executes the job asynchronously with the given input tensors.

This method initiates an asynchronous operation using the provided input tensors, and stores the results in the output tensors. The function returns a handle to the asynchronous job, allowing the caller to track or manage its execution.

Parameters

inputs	A constant reference to a vector of input tensors required for the job, vector dimensions: [batch][tensors]. The outer vector size must be between 1 and get_batch_size() (inclusive). Each inner vector must contain one NpuTensor per model input.
outputs	A reference to a vector where the output tensors will be stored upon completion, vector dimensions: [batch][tensors]. The outer vector size must be between 1 and get_batch_size() (inclusive). Each inner vector must contain one NpuTensor per model output.

Returns: JobHandle A handle representing the asynchronous job.

Note: Users should provide tensors in the same order as returned by get_tensors_info(). inputs and outputs must remain valid until the job is completed.; If the returned JobHandle has status StatusCode::RESOURCE_UNAVAILABLE, the submission failed because all internal execution slots are busy. This is a transient condition; the application should retry the submission.

See also: wait, execute

◆ execute_async() [2/2]

virtual JobHandle vart::Runner::execute_async	(	const std::vector< std::vector< NpuTensor >> &	inputs,
		std::vector< std::vector< NpuTensor >> &	outputs,
		ExecuteAsyncCallback	cb
	)

pure virtualnoexcept

Executes the operation asynchronously with the given input tensors.

This method starts the asynchronous execution of the operation using the provided input tensors. The results will be stored in the output tensors, and the specified callback will be invoked upon completion.

Note: The callback may be invoked from an internal worker thread, not necessarily the calling thread. Users are responsible for ensuring thread safety when accessing shared resources in the callback.

Parameters

inputs	A constant reference to a vector of input tensors to be processed, vector dimensions: [batch][tensors]. The outer vector size must be between 1 and get_batch_size() (inclusive). Each inner vector must contain one NpuTensor per model input.
outputs	A reference to a vector where the output tensors will be stored, vector dimensions: [batch][tensors]. The outer vector size must be between 1 and get_batch_size() (inclusive). Each inner vector must contain one NpuTensor per model output.
cb	A callback function to be called when the asynchronous execution is complete.

Returns: A JobHandle for the submitted asynchronous job.

Note: Users should provide tensors in the same order as returned by get_tensors_info(). inputs and outputs must be valid until the callback is invoked.; If the returned JobHandle has status StatusCode::RESOURCE_UNAVAILABLE, the submission failed because all internal execution slots are busy. This is a transient condition; the application should retry the submission.

See also: wait, execute

◆ get_batch_size()

virtual size_t vart::Runner::get_batch_size ( ) const

pure virtual

Returns the device batch size.

The device batch size is the number of input/output sets the NPU processes in a single inference call as a batch. This value determines the maximum outer vector size accepted by execute() and execute_async().

Returns: The device batch size.

◆ get_num_input_tensors()

virtual size_t vart::Runner::get_num_input_tensors ( ) const

pure virtual

Returns the number of input tensors.

This method retrieves the number of input tensors required by the model or operation.

Returns: The number of input tensors.

◆ get_num_output_tensors()

virtual size_t vart::Runner::get_num_output_tensors ( ) const

pure virtual

Returns the number of output tensors.

This method retrieves the number of output tensors produced by the model or operation.

Returns: The number of output tensors.

◆ get_quant_parameters()

virtual const QuantParameters& vart::Runner::get_quant_parameters ( const std::string & tensor_name ) const

pure virtual

Retrieves the quantization parameters for a specific tensor.

This method retrieves the quantization parameters for a tensor identified by its name.

Parameters

tensor_name The name of the tensor for which to retrieve quantization parameters.

Returns: A constant reference to the QuantParameters object containing the scale factor and optional zero point.

Exceptions

std::runtime_error if quantization parameters are not found for the tensor.

◆ get_tensor_info_by_name()

virtual const NpuTensorInfo& vart::Runner::get_tensor_info_by_name	(	const std::string &	tensor_name,
		TensorType	type
	)		const

pure virtual

Unified API to retrieve tensor information by name and tensor type (CPU/HW).

This method retrieves tensor information for a specific tensor identified by name, with the ability to specify whether to retrieve CPU or HW tensor information.

Parameters

tensor_name	The name of the tensor for which to retrieve information.
type	Specifies whether to retrieve CPU or HW tensor information.

Returns: A constant reference to the NpuTensorInfo object describing the specified tensor.

Exceptions

std::runtime_error if the tensor name is not found.

◆ get_tensors_info()

virtual const std::vector<NpuTensorInfo>& vart::Runner::get_tensors_info	(	TensorDirection	direction,
		TensorType	type
	)		const

pure virtual

Unified API to retrieve tensor information based on direction and tensor type (CPU/HW).

This method retrieves tensor information based on the specified direction (input/output) and tensor type (CPU/HW).

Parameters

direction	Specifies whether to retrieve input or output tensor information.
type	Specifies whether to retrieve CPU or HW tensor information.

Returns: A constant reference to a vector containing NpuTensorInfo objects, each describing a tensor matching the specified criteria.

◆ wait()

virtual StatusCode vart::Runner::wait	(	const JobHandle &	job_handle,
		std::chrono::milliseconds	timeout
	)

pure virtualnoexcept

Waits for the completion of an asynchronous job.

This method is used to check the status of a job submitted using execute_async, and blocks until the specified job is completed or the timeout expires.

Parameters

job_handle	A constant reference to the handle of the job to wait for.
timeout	The maximum time to wait for job completion. Zero timeout means the wait should check the job completion status and return immediately. If a positive timeout is specified, the wait will return once the task is completed, or the specified time has elapsed.

Returns: StatusCode The status of the wait operation. Returns StatusCode::JOB_PENDING if the job has not yet completed (normal polling outcome, not an error).

See also: execute_async

The documentation for this class was generated from the following file:

vart_runner_factory.hpp

Public Types

Public Member Functions

Protected Member Functions

Detailed Description

Member Typedef Documentation

◆ ExecuteAsyncCallback

Constructor & Destructor Documentation

◆ Runner()

◆ ~Runner()

Member Function Documentation

◆ allocate_npu_tensor()

◆ allocate_sub_tensor()

◆ execute()

◆ execute_async() [1/2]

◆ execute_async() [2/2]

◆ get_batch_size()

◆ get_num_input_tensors()

◆ get_num_output_tensors()

◆ get_quant_parameters()

◆ get_tensor_info_by_name()

◆ get_tensors_info()

◆ wait()