Applications#

This section focuses on another key customization opportunity offered by Vitis AI, empowering you to seamlessly integrate your custom applications into Vitis AI.

1. VART Runner Application#

There are two types of VART runner applications provided by Vitis AI: Python and C++. These are developed based on VART ML APIs.

VART Runner C++ Application#

This section explains the VART runner C++ application, which is developed based on VART ML APIs and can be used to execute the snapshot of your model. The application’s executable is named vart_ml_demo. You can make changes to this application if required, cross-compile it, copy the vart_ml_demo application to the board, and use it.

Note

The vart_ml_demo is available in $VITIS_AI_REPO/src/vart_ml/demo/vart_ml_demo.cpp.

Inference on Embedded Device#

The following steps are required to perform inference on an embedded device. For each step, a software call performing the execution on the CPU is provided in low-level APIs:

  1. Create a “runner” instance.

  2. Set the input and output data type.

  3. Prepare the input.

  4. Start inference.

  5. Wait for inference to complete.

  6. Retrieve output.

Example Pseudo Code#

The following pseudo code describes how to build a C++ application using the low-level C API:

// Include header file for NPU runner APIs
#include <vart_ml_runner/runner.h>

// Create runner object with snapshot path
auto runner = vart::Runner::create_runner(options.snapshot);

// Get input and output tensors
auto inputTensors  = runner->get_input_tensors();
auto outputTensors = runner->get_output_tensors();

// Get input and output tensor size
size_t inputCnt  = inputTensors.size();
size_t outputCnt = outputTensors.size();

// Set data type for quantization
for (auto tensor : inputTensors)
    runner->set_data_type(tensor, vart::DataType::FLOAT32);
for (auto tensor : outputTensors)
    runner->set_data_type(tensor, vart::DataType::FLOAT32);

// Set native format
for (auto tensor : outputTensors)
    runner->set_native_format(tensor, true);

// Prepare input and output buffer for sending to the NPU
const void* inputbuf_ptr[inputCnt * runner->get_batch_size()];
void*   outputbuf_ptr[outputCnt * runner->get_batch_size()];

// Pre-process the data
// Quantization and reorder
// Execute the inference
auto job_id = runner->execute_async(inputbuf_ptr, outputbuf_ptr);
// Wait for the inference
runner->wait(job_id.first, -1);
// Dequantization and reorder
// Post-processing

Detailed explanation#

  1. Create a “Runner” Instance (to be done once)

    The vart::Runner::create_runner(model_directory) function creates a runner instance that reads the model/snapshot and updates its internal parameters for further use. This setup is required once for each instance of the model or snapshot.

    Use get_input_tensors() and get_output_tensors() to get details of tensors such as size and shape.

  2. Set the Input and Output Data Type

    Use the set_data_type() API to define the input and expected output data types for the NPU. If the specified data type differs from what the snapshot expects, the NPU stack applies a conversion.

  3. Prepare Input

    Inputs and outputs are arrays of pointers to the buffers. For a model with N input layers and a snapshot with a batch size of B, N*B pointers should be provided, and the same applies to the output layers. You can provide the input based on the data type set in the previous step.

  4. Start Inference

    Begin the inference process by invoking execute_async(input, output) with the buffer pointers created in the previous steps.

  5. Wait for Inference to Complete

    Use wait(int jobid, int timeout = -1) to wait for the completion of the inference process. This ensures that the inference results are ready for retrieval.

  6. Retrieve Output

    After the wait completes without errors or timeouts, the output buffers contain the inference results in the format specified in step 2. You can then apply postprocessing to this data to obtain more meaningful outputs. Ensure to handle any potential errors or exceptions that might occur during each step of the process. Additionally, verify that the necessary input data and resources are properly prepared before starting the inference process.

Recompile VART Runner C++ Application#

By default, all VART software components and VART Runner C++ application are built as PetaLinux recipes and included in the generated SD card image. If you need to recompile the VART software components and VART runner application due to any changes, follow these steps:

  1. Ensure that the sdk-vai-5.1.sh is installed in the <dest_directory> by following steps mentioned in Set up cross-compiler section of Software Installations.

  2. Source sysroot path if not done already:

    $ source <dest_directory>/environment-setup-cortexa72-cortexa53-amd-linux
    
  3. Build VART Stack and runner application:

    $ cd <path_to_Vitis-AI>/src/vart_ml/
    $ make clean
    $ make all
    $ make install-tar    # This install rule will compress all installable components into a tar, which one can copy to target and untar to place each component in their respective locations. `vart_ml_install.tar.gz` will be generated in the current directory. This file should be untar on the board using command `tar xzf vart_ml_install.tar.gz -C /`. This command will overwrite the VART ML SW stack on the board.
    $ make install-sdk    # This will populate the petalinux sdk sysroot which is exported in the environment with compiled libraries and headers. This will be used when the same sdk is used for compiling applications based on this repo. This step is not needed if only the VART ML SW stack needs to be updated on the board.
    
    • This generates all necessary shared objects and the executable of VART runner application (i.e vart_ml_demo) in their respective folders.

  4. Copy the compiled components to the target board using the below commands. Ensure the target board is connected to Ethernet and has acquired an IP address.

    $ cd <path_to_Vitis-AI_source_code>/Vitis-AI/src/vart_ml/
    $ scp vart_ml_install.tar.gz root@<target ip>:/tmp/
    # on the board
    $ tar xzf /tmp/vart_ml_install.tar.gz -C /
    

    After copying the compiled application (vart_ml_demo) and libraries to the target board and installed, refer to the Execute Sample Model for usage and sample commands of vart_ml_demo application to run the model.

    The vart_ml_demo application supports running multiple models in multiple threads within the same application context.

    Following is the example command to run multiple models by using vart_ml_demo application.

    vart_ml_demo --imgPath /root/imagenet/ILSVRC2012_img_val/ --snapshot /run/media/mmcblk0p1/snapshot.$NPU_IP.resnet50.TF+/run/media/mmcblk0p1/snapshot.$NPU_IP.resnet50.TF --labels /etc/vai/labels/labels --goldFile /root/imagenet/ILSVRC_2012_val_GroundTruth_10p.txt  --nbImages 3
    

    You can provide multiple snapshots path with + symbol as shown in the previous command.

By now, you have learned about the VART runner C++ application, steps involved such as execute inference and cross-compilation. Now you can refer to the following section, which covers the VART runner Python application.

VART Runner Python Application#

This section presents a simple Python example to execute a snapshot on embedded systems. The vart_ml_runner.py is a test application to run any snapshot using VART Python APIs. It can be used to feed input data (snapshot) and validate that the inference works and doesn’t timeout.

To execute a snapshot using the Runner Graph API, use the following Python code snippet, which is available in $VITIS_AI_REPO/src/vart_ml/demo/vart_ml_runner.py.

import numpy as np
import VART
model = VART.Runner(snapshot_dir=snapshot_dir)
for i in range(10):
    input_data = np.random.rand(1, 224, 224, 3).astype(np.float32)
    out = model([input_data])[0]

The result of the VART.Runner API is a model (similar to a PyTorch model) that takes input as an argument and provides outputs. Inputs and outputs are Python lists of Tensors (to support multiple input or output layers).

The inference on the VART can be executed using the following commands:

Note

Before executing the snapshot, ensure that the target board (VEK280) is up and running. Copy required files like snapshot (of your model), sample test images/video, labels file, and ground truth files to the target board.

  1. Navigate to the /root path where the snapshot is copied:

    $ cd /root
    
  2. Source Vitis AI tools environment:

    $ source /etc/vai.sh
    
  3. Run the Python application:

    $ vart_ml_runner.py --snapshot <snapshot_path>
    

2. End-to-End (X+ML) Application#

Refer to VART X APIs Architecture Guide and VART X APIs Application Developer Guide for descriptions of VART X APIs and guides you can use for customization of X+ML application.

This section covers the cross-compilation of VART X software components and X+ML application.

By default, all VART X software components and X+ML applications are built as PetaLinux recipes and included in the generated SD card image. If you need to recompile the VART X software components due to any changes, follow these steps:

  1. Ensure that the sdk-vai-5.1.sh is installed in the <dest_directory> by following steps mentioned in the Set up cross-compiler section of Software Installations.

  2. Source sysroot path if not done already:

    $ source <dest_directory>/environment-setup-cortexa72-cortexa53-amd-linux
    
  3. Build VART X software components:

    $ cd <path_to_Vitis-AI>/src/
    $ make clean
    $ make vart_x
    $ make vart_x_install_tar
    

    The previous steps compiles and compresses all installable VART X components into a tar file (vart_x_install.tar.gz) that needs to be copied to the target board.

  4. Copy vart_x installer (vart_x_install.tar.gz) to target board:

    # Before executing below commands, ensure that board is up and running
    $ scp vart_x/install/vart_x_install.tar.gz root<target_board_ip>:~/
    

    The VART X software components are built and copied to target board. Now built the X+ML application.

  5. Navigate to the x_plus_ml folder:

    $ cd <path_of_vitis-ai-2025.1>/Vitis-AI/examples/x_plus_ml/
    
  6. Build the x_plus_ml_app application by executing the following command:

    $ make
    
  7. Copy the x_plus_ml_app app and json-config to the target board using SCP:

    $ scp x_plus_ml_app <board_ip>:/usr/bin/
    $ scp -r x_plus_ml/json-config <board_ip>:/etc/vai/
    
  8. Install vart_x binaries on the target board:

    # Before executing below commands, ensure that board is up and running
    $ cd ~/
        $ tar -xvf vart_x_install.tar.gz -C /
    

After copying the compiled binaries to the target board and completing the installation, refer to the Execute Sample Model documentation for instructions and sample commands on how to use the x_plus_ml_app application to run the model.