Class vart::Runner#

class Runner#

Abstract base class for executing model inference operations.

The Runner class defines a unified interface for running synchronous and asynchronous inference tasks on machine learning models. It provides methods for retrieving tensor metadata, executing computations, and managing asynchronous job execution.

Key Features:

  • Query input and output tensor information, including support for zero-copy operations.

  • Perform synchronous inference with input and output tensors.

  • Submit asynchronous inference jobs and manage their lifecycle via job handles or callbacks.

  • Support for both polling/waiting and callback-based asynchronous execution models.

Public Types

using ExecuteAsyncCallback = std::function<void(const JobHandle&, void*)>#

Type alias for the callback function used in asynchronous execution operations.

This callback function is invoked when an asynchronous operation completes. The callback receives the job handle containing the completion status and a user-provided data pointer.

Note

The callback may be invoked from an internal worker thread, so users must ensure thread safety when accessing shared resources.

Param job_handle:

The handle of the completed asynchronous job, containing the final status and job identifier.

Param user_data:

A pointer to user-defined data that was provided when initiating the asynchronous operation.

Public Functions

virtual ~Runner() = default#

Destroys the Runner object.

virtual const std::string &get_model_name(void) const = 0#

Return the model name.

Returns:

Model name.

virtual const std::vector<NpuTensorInfo> &get_tensors_info(const TensorDirection &direction, const TensorType &type) const = 0#

Unified API to retrieve tensor information based on direction and tensor type (CPU/HW).

This method retrieves tensor information based on the specified direction (input/output) and tensor type (CPU/HW).

Parameters:
  • direction – Specifies whether to retrieve input or output tensor information.

  • type – Specifies whether to retrieve CPU or HW tensor information.

Returns:

A constant reference to a vector containing NpuTensorInfo objects, each describing a tensor matching the specified criteria.

virtual const NpuTensorInfo &get_tensor_info_by_name(const std::string &tensor_name, const TensorType &type) const = 0#

Unified API to retrieve tensor information by name and tensor type (CPU/HW).

This method retrieves tensor information for a specific tensor identified by name, with the ability to specify whether to retrieve CPU or HW tensor information.

Parameters:
  • tensor_name – The name of the tensor for which to retrieve information.

  • type – Specifies whether to retrieve CPU or HW tensor information.

Returns:

A constant reference to the NpuTensorInfo object describing the specified tensor.

virtual const QuantParameters &get_quant_parameters(const std::string &tensor_name) const = 0#

Retrieves the quantization parameters for a specific tensor.

This method retrieves the quantization parameters for a tensor identified by its name.

Parameters:

tensor_name – The name of the tensor for which to retrieve quantization parameters.

Returns:

A QuantParameters object containing the scale factor and optional zero point.

virtual size_t get_num_input_tensors() const = 0#

Returns the number of input tensors.

This method retrieves the number of input tensors required by the model or operation.

Returns:

The number of input tensors.

virtual size_t get_num_output_tensors() const = 0#

Returns the number of output tensors.

This method retrieves the number of output tensors produced by the model or operation.

Returns:

The number of output tensors.

virtual size_t get_batch_size() const = 0#

Returns the device batch size.

This method retrieves the device batch size for the model or operation.

Returns:

The device batch size.

virtual StatusCode execute(const std::vector<std::vector<NpuTensor>> &inputs, std::vector<std::vector<NpuTensor>> &outputs) = 0#

Executes the main computation using the provided input tensors and produces output tensors.

This method is responsible for performing the actual inference or computation using the specified input tensors and generating the corresponding output tensors.

Note

Users should provide tensors in the same order as returned by get_tensors_info().

Parameters:
  • inputs – A constant reference to a vector of input NpuTensor objects, vector dimensions: [batch][tensors].

  • outputs – A reference to a vector of NpuTensor objects where the outputs will be stored, vector dimensions: [batch][tensors].

Returns:

A StatusCode indicating the success, failure, etc of the execution.

virtual JobHandle execute_async(const std::vector<std::vector<NpuTensor>> &inputs, std::vector<std::vector<NpuTensor>> &outputs) = 0#

Executes the job asynchronously with the given input tensors.

This method initiates an asynchronous operation using the provided input tensors, and stores the results in the output tensors. The function returns a handle to the asynchronous job, allowing the caller to track or manage its execution.

Note

Users should provide tensors in the same order as returned by get_tensors_info(). inputs and outputs must remain valid until the job is completed.

Parameters:
  • inputs – A constant reference to a vector of input tensors required for the job, vector dimensions: [batch][tensors].

  • outputs – A reference to a vector where the output tensors will be stored upon completion, vector dimensions: [batch][tensors].

Returns:

JobHandle A handle representing the asynchronous job.

virtual StatusCode wait(const JobHandle &job_handle, unsigned int timeout) = 0#

Waits for the completion of an asynchronous job.

This method is used to check the status of job submitted using execute_async and blocks until the specified job is completed or the timeout expires.

Parameters:
  • job_handle – A constant reference to the handle of the job to wait for.

  • timeout – The maximum time to wait in milliseconds. Zero timeout means the wait should check the job completion status and return immediately. If a positive timeout is specified, the wait will return once the task is completed, or the specified time has elapsed.

Returns:

StatusCode The status of the wait operation.

virtual JobHandle execute_async(const std::vector<std::vector<NpuTensor>> &inputs, std::vector<std::vector<NpuTensor>> &outputs, ExecuteAsyncCallback cb, void *user_data) = 0#

Executes the operation asynchronously with the given input tensors.

This method starts the asynchronous execution of the operation using the provided input tensors. The results will be stored in the output tensors, and the specified callback will be invoked upon completion.

Note

The callback may be invoked from an internal worker thread, not necessarily the calling thread. Users are responsible for ensuring thread safety when accessing shared resources in the callback.

Note

Users should provide tensors in the same order as returned by get_tensors_info(). inputs and outputs must be valid until the callback is invoked.

Parameters:
  • inputs – A vector of input tensors to be processed, vector dimensions: [batch][tensors].

  • outputs – A reference to a vector where the output tensors will be stored, vector dimensions: [batch][tensors].

  • cb – A callback function to be called when the asynchronous execution is complete.

  • user_data – A pointer to user-defined data that will be passed to the callback function.

Returns:

JobHandle JobHandle for the submitted asynchronous job.

virtual NpuTensor allocate_npu_tensor(const NpuTensorInfo &info, StatusCode &status) const = 0#

Allocates memory for an NPU tensor.

This API allocates contiguous tensor memory specifically for MemoryType::XRT_BO (i.e., an XRT Buffer Object), but only when the TensorType is set to HW. The caller is responsible for releasing the allocated memory using deallocate_npu_tensor when the tensor is no longer needed.

Parameters:
  • info – The metadata associated with the tensor.

  • status – A reference to a StatusCode variable that will capture the outcome of the allocation operation.

Returns:

NpuTensor The allocated NPU tensor. An empty tensor is returned if the allocation fails by setting the status accordingly.

virtual NpuTensor allocate_npu_tensor(const NpuTensorInfo &info, StatusCode &status, MemoryType mem_type, size_t ddr_idx = 0) const = 0#

Allocates memory for an NPU tensor.

This API allocates contiguous tensor memory specifically for MemoryType::XRT_BO (i.e., an XRT Buffer Object), but only when the TensorType is set to HW. The caller is responsible for releasing the allocated memory using deallocate_npu_tensor when the tensor is no longer needed.

Parameters:
  • info – The metadata associated with the tensor.

  • status – A reference to a StatusCode variable that will capture the outcome of the allocation operation.

  • mem_type – Specifies the memory type of the buffer.

  • ddr_idx – Select the DDR in which the buffer will be allocated.

Returns:

NpuTensor The allocated NPU tensor. An empty tensor is returned if the allocation fails by setting the status accordingly.

virtual StatusCode deallocate_npu_tensor(NpuTensor &tensor) const = 0#

Deallocates an NPU tensor previously allocated by allocate_npu_tensor.

Parameters:

tensor – Reference to the tensor to be deallocated.

Returns:

StatusCode indicating success or failure.

virtual uint8_t get_nb_ddrs(void) const = 0#

Return the number of DDR used bu the NPU.

Returns:

Number of DDR.