File runner.py#

module runner#

Executes the runner with the given inputs.

The VART.Runner can take the following optional arguments to build a model: model methods to enable Zero copy The API can support zero copy feature for input and output buffers.

Such buffers are accessible directly by the NPU IP, and therefore have to be allocated physically in the DDR memory. A buffer allocated by a classic python function can’t be used directly by the NPU. Only a buffer allocated by XRT can be used. As a consequence, the API provides a function to allocate the input buffers. The python application needs to write (like using np.copyto) into that buffer. However, the output buffers can be allocated directly by the SW stack, the inference call can return directly a physically allocated buffer.

  • set_output_native_formats(bool) take a bool to enable native outputs

  • set_input_native_formats(bool) take a bool to enable native inputs

  • alloc_native_bufs(nbImages) allocate input buffers to be used for native input. nbImages is an optional argument, default value is the batch size of the snapshot. Returns a list of buffers of size nbInputs x nbImages with the same order as C++ stack, so for a 2 inputs batch size of 3: [inputA_batch0, inputB_batch0, inputA_batch1, inputB_batch1, inputA_batch2, inputB_batch2] Limitations for the zero copy API

  • input format is in NHWC.

  • maximum of input channels supported is 8.

  • model with multiple inputs with batchSize > 1 hasn’t been validated

  • input and output of the models don’t have the same type as the classic mode:

    • zero copy exposes each batch as a separate buffer. It allows having each buffer on a separate DDR memory, to optimize efficiency.

    • while the default mode concatenates each batch in the same buffer.

      Sample code:

import VART

model = VART.Runner()
output_inference = model(input_inference)
Parameters:
  • snapshot_dir – path of the snapshot, default is to take the content of the variable environment VAISW_SNAPSHOT_DIRECTORY

  • network_name – name of the model

  • output_names – list of output to return, default is to return all outputs

  • npu_only – runs only the sub-graphs executed on the AIE, CPU subgraphs are ignored

Returns:

VART.Runner

class VART#
@brief A Python API for VART, providing NPU-based model inference.

This class wraps the VART C++ API for Python, leveraging an NPU Runner for execution.

Public Functions

__init__(self, str snapshot_dir, str network_name, Optional[List[str]] output_names=None, bool npu_only=False)#
@brief Construct a new VART object from a snapshot.

@param snapshot_dir Directory containing the snapshot.
@param network_name Name of the network.
@param output_names Optional list of output tensor names.
@param npu_only If True, use NPU exclusively for inference.
init_out_arrays(self, bool use_native_bufs=False)#
@brief Initializes output arrays.

@param use_native_bufs If True, allocate native buffers for output data.
alloc_native_bufs(self, List[List[int]] shape, int nb_images)#
@brief Allocates native buffers for given shapes and returns NumPy arrays.

@param shape List of shapes for the buffers to allocate.
@param nb_images Number of images in the buffer.
@return List of NumPy arrays pointing to the allocated buffers.
get_input_shapes(self)#
@brief Gets the shapes of the model's input tensors.

@return List of input tensor shapes.
get_input_shape_formats(self)#
@brief Gets the shape formats of the model's input tensors.

@return List of input shape formats.
get_output_shape_formats(self)#
@brief Gets the shape formats of the model's output tensors.

@return List of output shape formats.
get_input_types(self)#
@brief Gets the data types of the model's input tensors.

@return List of NumPy data types for input tensors.
set_input_native_formats(self, bool value)#
@brief Sets whether input buffers are in native format.

@param value If True, use native input format.
@return True if the format was set successfully, False otherwise.
set_output_native_formats(self, bool value)#
@brief Sets whether output buffers are in native format.

@param value If True, use native output format.
@return True if the format was set successfully, False otherwise.
get_input_coeffs(self)#
@brief Gets the quantization coefficients for input tensors.

@return List of input quantization coefficients.
get_output_coeffs(self)#
@brief Gets the quantization coefficients for output tensors.

@return List of output quantization coefficients.
execute(self, List[np.ndarray] inputs)#
@brief Executes the model using the Runner.

@param inputs List of input buffers.
@return List of output buffers.

Public Members

runner#