VART-ML  0.3.0
vart::Runner Class Referenceabstract

Abstract base class for executing model inference operations. More...

#include <vart_runner_factory.hpp>

Public Types

using ExecuteAsyncCallback = std::function< void(const JobHandle &)>
 Type alias for the callback function used in asynchronous execution operations. More...
 

Public Member Functions

virtual ~Runner ()=default
 Destroys the Runner object. More...
 
virtual const std::vector< NpuTensorInfo > & get_tensors_info (TensorDirection direction, TensorType type) const =0
 Unified API to retrieve tensor information based on direction and tensor type (CPU/HW). More...
 
virtual const NpuTensorInfoget_tensor_info_by_name (const std::string &tensor_name, TensorType type) const =0
 Unified API to retrieve tensor information by name and tensor type (CPU/HW). More...
 
virtual const QuantParametersget_quant_parameters (const std::string &tensor_name) const =0
 Retrieves the quantization parameters for a specific tensor. More...
 
virtual size_t get_num_input_tensors () const =0
 Returns the number of input tensors. More...
 
virtual size_t get_num_output_tensors () const =0
 Returns the number of output tensors. More...
 
virtual size_t get_batch_size () const =0
 Returns the device batch size. More...
 
virtual StatusCode execute (const std::vector< std::vector< NpuTensor >> &inputs, std::vector< std::vector< NpuTensor >> &outputs) noexcept=0
 Executes the main computation using the provided input tensors and produces output tensors. More...
 
virtual JobHandle execute_async (const std::vector< std::vector< NpuTensor >> &inputs, std::vector< std::vector< NpuTensor >> &outputs) noexcept=0
 Executes the job asynchronously with the given input tensors. More...
 
virtual StatusCode wait (const JobHandle &job_handle, std::chrono::milliseconds timeout) noexcept=0
 Waits for the completion of an asynchronous job. More...
 
virtual JobHandle execute_async (const std::vector< std::vector< NpuTensor >> &inputs, std::vector< std::vector< NpuTensor >> &outputs, ExecuteAsyncCallback cb) noexcept=0
 Executes the operation asynchronously with the given input tensors. More...
 
virtual NpuTensor allocate_npu_tensor (const NpuTensorInfo &info) const =0
 Allocates memory for an NPU tensor. More...
 
virtual NpuTensor allocate_sub_tensor (const NpuTensor &parent, const NpuTensorInfo &info, size_t offset) const =0
 Creates a sub-tensor from a parent tensor with the specified metadata and offset. More...
 

Protected Member Functions

 Runner (const std::string &model_path, const std::unordered_map< std::string, std::any > &options={})
 Constructs a Runner object with the specified model path and options. More...
 

Detailed Description

Abstract base class for executing model inference operations.

The Runner class defines a unified interface for running synchronous and asynchronous inference tasks on machine learning models. It provides methods for retrieving tensor metadata, executing computations, and managing asynchronous job execution.

Key Features:

  • Query input and output tensor information, including support for zero-copy operations.
  • Perform synchronous inference with input and output tensors.
  • Submit asynchronous inference jobs and manage their lifecycle via job handles or callbacks.
  • Support for both polling/waiting and callback-based asynchronous execution models.

Runner instances are thread-safe and designed to be shared across threads via std::shared_ptr<Runner>. Use RunnerFactory::create_runner() to obtain an instance.

Member Typedef Documentation

◆ ExecuteAsyncCallback

Type alias for the callback function used in asynchronous execution operations.

This callback function is invoked when an asynchronous operation completes. The callback receives a const reference to the JobHandle containing the completion status and job identifier.

Note
The callback may be invoked from an internal worker thread, so users must ensure thread safety when accessing shared resources.

Constructor & Destructor Documentation

◆ Runner()

vart::Runner::Runner ( const std::string &  model_path,
const std::unordered_map< std::string, std::any > &  options = {} 
)
inlineexplicitprotected

Constructs a Runner object with the specified model path and options.

Parameters
model_pathThe file path to the model to be used by the Runner.
optionsOptional configuration parameters for the Runner, provided as a map of string keys to values of any type.
Exceptions
std::runtime_errorif Runner initialization fails.

◆ ~Runner()

virtual vart::Runner::~Runner ( )
virtualdefault

Destroys the Runner object.

Member Function Documentation

◆ allocate_npu_tensor()

virtual NpuTensor vart::Runner::allocate_npu_tensor ( const NpuTensorInfo info) const
pure virtual

Allocates memory for an NPU tensor.

This API allocates contiguous tensor memory backed by an XRT Buffer Object (MemoryType::XRT_BO). The user should copy input data into the allocated buffer before passing it to execute/execute_async.

Parameters
infoThe metadata associated with the tensor.
Returns
NpuTensor The allocated NPU tensor.
Exceptions
std::runtime_errorif tensor allocation fails.
std::invalid_argumentif the provided NpuTensorInfo is invalid.
See also
allocate_sub_tensor, NpuTensor::sync_buffer

◆ allocate_sub_tensor()

virtual NpuTensor vart::Runner::allocate_sub_tensor ( const NpuTensor parent,
const NpuTensorInfo info,
size_t  offset 
) const
pure virtual

Creates a sub-tensor from a parent tensor with the specified metadata and offset.

The returned sub-tensor is a fully usable NpuTensor and can be passed to execute/execute_async like any other NpuTensor.

Parameters
parentThe parent tensor from which the sub-tensor will be created.
infoThe metadata for the sub-tensor.
offsetThe offset in bytes from the start of the parent tensor's buffer.
Returns
NpuTensor The created sub-tensor.
Exceptions
std::runtime_errorif sub-tensor creation fails.
std::invalid_argumentif the provided arguments are invalid.
Note
Requirements:
  • Sub-tensors can only be created from parent tensors allocated via allocate_npu_tensor.
  • The parent tensor must be large enough to contain the sub-tensor at the specified offset.
  • The offset and size in info must not exceed the parent tensor's buffer bounds.
  • Sub-tensors cannot be created from other sub-tensors (only one level of nesting is supported).

Memory Management:

  • A sub-tensor is created from a parent tensor as a view at a specified offset.
  • Creating a sub-tensor does not allocate new memory; it reuses a portion of the parent tensor's memory.
  • Parent memory/buffer is released only after the parent tensor and all derived sub-tensors are destroyed.
  • Destroying only the parent tensor does not release the underlying memory while any sub-tensor is still alive.
    See also
    allocate_npu_tensor

◆ execute()

virtual StatusCode vart::Runner::execute ( const std::vector< std::vector< NpuTensor >> &  inputs,
std::vector< std::vector< NpuTensor >> &  outputs 
)
pure virtualnoexcept

Executes the main computation using the provided input tensors and produces output tensors.

This method is responsible for performing the actual inference or computation using the specified input tensors and generating the corresponding output tensors.

Parameters
inputsA constant reference to a vector of input NpuTensor objects, vector dimensions: [batch][tensors]. The outer vector size must be between 1 and get_batch_size() (inclusive). Each inner vector must contain one NpuTensor per model input.
outputsA reference to a vector of NpuTensor objects where the outputs will be stored, vector dimensions: [batch][tensors]. The outer vector size must be between 1 and get_batch_size() (inclusive). Each inner vector must contain one NpuTensor per model output.
Returns
A StatusCode indicating the success or failure of the execution.
Note
Users should provide tensors in the same order as returned by get_tensors_info().
See also
execute_async, wait

◆ execute_async() [1/2]

virtual JobHandle vart::Runner::execute_async ( const std::vector< std::vector< NpuTensor >> &  inputs,
std::vector< std::vector< NpuTensor >> &  outputs 
)
pure virtualnoexcept

Executes the job asynchronously with the given input tensors.

This method initiates an asynchronous operation using the provided input tensors, and stores the results in the output tensors. The function returns a handle to the asynchronous job, allowing the caller to track or manage its execution.

Parameters
inputsA constant reference to a vector of input tensors required for the job, vector dimensions: [batch][tensors]. The outer vector size must be between 1 and get_batch_size() (inclusive). Each inner vector must contain one NpuTensor per model input.
outputsA reference to a vector where the output tensors will be stored upon completion, vector dimensions: [batch][tensors]. The outer vector size must be between 1 and get_batch_size() (inclusive). Each inner vector must contain one NpuTensor per model output.
Returns
JobHandle A handle representing the asynchronous job.
Note
Users should provide tensors in the same order as returned by get_tensors_info(). inputs and outputs must remain valid until the job is completed.
If the returned JobHandle has status StatusCode::RESOURCE_UNAVAILABLE, the submission failed because all internal execution slots are busy. This is a transient condition; the application should retry the submission.
See also
wait, execute

◆ execute_async() [2/2]

virtual JobHandle vart::Runner::execute_async ( const std::vector< std::vector< NpuTensor >> &  inputs,
std::vector< std::vector< NpuTensor >> &  outputs,
ExecuteAsyncCallback  cb 
)
pure virtualnoexcept

Executes the operation asynchronously with the given input tensors.

This method starts the asynchronous execution of the operation using the provided input tensors. The results will be stored in the output tensors, and the specified callback will be invoked upon completion.

Note
The callback may be invoked from an internal worker thread, not necessarily the calling thread. Users are responsible for ensuring thread safety when accessing shared resources in the callback.
Parameters
inputsA constant reference to a vector of input tensors to be processed, vector dimensions: [batch][tensors]. The outer vector size must be between 1 and get_batch_size() (inclusive). Each inner vector must contain one NpuTensor per model input.
outputsA reference to a vector where the output tensors will be stored, vector dimensions: [batch][tensors]. The outer vector size must be between 1 and get_batch_size() (inclusive). Each inner vector must contain one NpuTensor per model output.
cbA callback function to be called when the asynchronous execution is complete.
Returns
A JobHandle for the submitted asynchronous job.
Note
Users should provide tensors in the same order as returned by get_tensors_info(). inputs and outputs must be valid until the callback is invoked.
If the returned JobHandle has status StatusCode::RESOURCE_UNAVAILABLE, the submission failed because all internal execution slots are busy. This is a transient condition; the application should retry the submission.
See also
wait, execute

◆ get_batch_size()

virtual size_t vart::Runner::get_batch_size ( ) const
pure virtual

Returns the device batch size.

The device batch size is the number of input/output sets the NPU processes in a single inference call as a batch. This value determines the maximum outer vector size accepted by execute() and execute_async().

Returns
The device batch size.

◆ get_num_input_tensors()

virtual size_t vart::Runner::get_num_input_tensors ( ) const
pure virtual

Returns the number of input tensors.

This method retrieves the number of input tensors required by the model or operation.

Returns
The number of input tensors.

◆ get_num_output_tensors()

virtual size_t vart::Runner::get_num_output_tensors ( ) const
pure virtual

Returns the number of output tensors.

This method retrieves the number of output tensors produced by the model or operation.

Returns
The number of output tensors.

◆ get_quant_parameters()

virtual const QuantParameters& vart::Runner::get_quant_parameters ( const std::string &  tensor_name) const
pure virtual

Retrieves the quantization parameters for a specific tensor.

This method retrieves the quantization parameters for a tensor identified by its name.

Parameters
tensor_nameThe name of the tensor for which to retrieve quantization parameters.
Returns
A constant reference to the QuantParameters object containing the scale factor and optional zero point.
Exceptions
std::runtime_errorif quantization parameters are not found for the tensor.

◆ get_tensor_info_by_name()

virtual const NpuTensorInfo& vart::Runner::get_tensor_info_by_name ( const std::string &  tensor_name,
TensorType  type 
) const
pure virtual

Unified API to retrieve tensor information by name and tensor type (CPU/HW).

This method retrieves tensor information for a specific tensor identified by name, with the ability to specify whether to retrieve CPU or HW tensor information.

Parameters
tensor_nameThe name of the tensor for which to retrieve information.
typeSpecifies whether to retrieve CPU or HW tensor information.
Returns
A constant reference to the NpuTensorInfo object describing the specified tensor.
Exceptions
std::runtime_errorif the tensor name is not found.

◆ get_tensors_info()

virtual const std::vector<NpuTensorInfo>& vart::Runner::get_tensors_info ( TensorDirection  direction,
TensorType  type 
) const
pure virtual

Unified API to retrieve tensor information based on direction and tensor type (CPU/HW).

This method retrieves tensor information based on the specified direction (input/output) and tensor type (CPU/HW).

Parameters
directionSpecifies whether to retrieve input or output tensor information.
typeSpecifies whether to retrieve CPU or HW tensor information.
Returns
A constant reference to a vector containing NpuTensorInfo objects, each describing a tensor matching the specified criteria.

◆ wait()

virtual StatusCode vart::Runner::wait ( const JobHandle job_handle,
std::chrono::milliseconds  timeout 
)
pure virtualnoexcept

Waits for the completion of an asynchronous job.

This method is used to check the status of a job submitted using execute_async, and blocks until the specified job is completed or the timeout expires.

Parameters
job_handleA constant reference to the handle of the job to wait for.
timeoutThe maximum time to wait for job completion. Zero timeout means the wait should check the job completion status and return immediately. If a positive timeout is specified, the wait will return once the task is completed, or the specified time has elapsed.
Returns
StatusCode The status of the wait operation. Returns StatusCode::JOB_PENDING if the job has not yet completed (normal polling outcome, not an error).
See also
execute_async

The documentation for this class was generated from the following file: