Docker Samples and Demos#

This section covers generating snapshots for widely used models by using a Python application named run_classification.sh. It is provided in the Vitis AI repository at $VITIS_AI_REPO/examples/python_examples/batcher. This section also describes generating snapshots for demo models in Docker.

Run the following steps to check the list of models included as an example in the run_classification.sh application:

  1. Navigate to the Vitis-AI directory:

    $ cd $VITIS_AI_REPO
    
  2. Launch Docker:

    $ ./docker/run.bash
    
  3. Navigate to the batcher folder:

    $ cd examples/python_examples/batcher
    
  4. Check the list of supported frameworks:

    $ ./run_classification.sh -f list
    

    This command displays the following output on the console.

    List of supported frameworks:
    
    onnxRuntime, pytorch, tensorflow, tensorflow2
    

    The following table displays the framework versions that are tested on Docker.

    Package     | Tested up to | Docker version
    ------------+--------------+-----------------
    tensorflow  | 2.16.1       | 2.9.0
    onnx        | 1.16.1       | 1.12.1
    onnxruntime | 1.18.0       | 1.12.0
    torch       | 2.3.1        | 1.12.1
    
  5. Check the list of supported models for the PyTorch framework:

    $ ./run_classification.sh -f pytorch -n list
    

    After running the previous command, the console displays the following output:

    List of supported networks for the framework pytorch:
    
    alexnet densenet121 densenet161 densenet169 densenet201 efficientnet_b0 efficientnet_v2_s googlenet_no_lrn inceptionv3 mnasnet0_5 mnasnet0_75 mnasnet1_0 mnasnet1_3 mobilenet_v2 regnet_x_16gf regnet_x_1_6gf regnet_x_32gf regnet_x_3_2gf regnet_x_400mf regnet_x_800mf regnet_x_8gf regnet_y_16gf regnet_y_1_6gf regnet_y_32gf regnet_y_3_2gf regnet_y_400mf regnet_y_8gf resnet101 resnet152 resnet18 resnet34 resnet50 resnext101_32x8d resnext50_32x4d shufflenet_v2_x0_5 shufflenet_v2_x1_0 shufflenet_v2_x1_5 shufflenet_v2_x2_0 squeezenet squeezenet1_1 vgg11 vgg11_bn vgg13 vgg13_bn vgg16 vgg16_bn vgg19 vgg19_bn wide_resnet101_2 wide_resnet50_2
    

    Similarly, you can check the list of supported models for TensorFlow (1 and 2) and ONNX:

    $ ./run_classification.sh -f tensorflow -n list
    $ ./run_classification.sh -f tensorflow2 -n list
    $ ./run_classification.sh -f onnxRuntime -n list
    

Generate Snapshot For ResNet50#

After reviewing all the models supported by the run_classification.sh script, follow these steps to generate a snapshot for the ResNet50 model as an example. Run the steps inside Docker:

  1. Navigate to the Vitis-AI directory:

    $ cd $VITIS_AI_REPO
    
  2. Enable the NPU software stack:

    $ source npu_ip/settings.sh
    
  3. Navigate to the batcher folder:

    $ cd examples/python_examples/batcher
    
  4. Run the following command to generate a snapshot for ResNet50:

    $ VAISW_SNAPSHOT_DIRECTORY=snapshot.resnet50.tf2.b19.0113 ./run_classification.sh -f tensorflow2 -n resnet50 -b 19
    

    This command generates the snapshot in $VITIS_AI_REPO/examples/python_examples/batcher/.

    [VAISW]
    [VAISW]    10 batches of 19 samples
    [VAISW]    1 input per batch (19x224x224x3)
    [VAISW]    1 output per batch (19x1001)
    [VAISW]    2 total subgraphs:
    [VAISW]            1 VAISW (FPGA) subgraph: 99.99% of total MACs (79.10 G)
    [VAISW]                    precision: FX8
    [VAISW]            1 Framework (CPU) subgraph
    [VAISW]    [INFO]:  snapshot directory dumped in snapshot.resnet50.tf2.b19.0113
    [VAISW]    [INFO]:  snapshot dumped for VE2802_NPU_IP_O00_A304_M3
    [VAISW]    190 samples
    [VAISW]    from 01/14/2026 00:48:34 to 01/14/2026 00:50:54
    

    After successfully generating the snapshot, the terminal displays the message: snapshot dumped for VE2802_NPU_IP_O00_A304_M3. This message indicates that you must use the corresponding SD Card image created for this specific IP to verify the snapshot. It is essential to build the reference design solution using the same NPU IP. If you do not do this, errors might occur. They can occur when a snapshot for one NPU IP version runs runs on an SD Card image meant for a different NPU IP.

    Note

    1. The previous command takes a few minutes to generate a snapshot.

    2. The accuracy details displayed in the previous table might differ in data.

  5. Copy the snapshot from the host machine to the target board. Ensure that the board is up and running:

    $ scp -r $VITIS_AI_REPO/examples/python_examples/batcher/snapshot.resnet50.tf2.b19.1007 root@<vek280_board_ip>:/root
    # Use the IP address of the VEK280 board in the previous command
    
  6. After transferring the snapshot to the target board, you can deploy it using the NPU runner applications. Refer to X+ML for more details.

Generate Snapshot for SSD_ResNet34#

Inside the Docker container, run the following commands to generate the snapshot for SSD-ResNet34 with a batch size of one. The following command generates a snapshot in the current directory named snapshot.ssd_resnet34.0207.

  1. Navigate to the Vitis-AI directory:

    $ cd $VITIS_AI_REPO
    
  2. Enable the NPU software stack:

    $ source npu_ip/settings.sh
    
  3. Navigate to the ssdResnet34 folder:

    $ cd examples/python_examples/ssdResnet34
    
  4. Generate a snapshot for SSD ResNet34:

    $ VAISW_SNAPSHOT_DIRECTORY=snapshot.ssd_resnet34.0113 make
    

    Note

    You can generate a snapshot with different batch sizes by executing the following command and replacing $batchsize with your desired batch size number. The following command uses batchsize=4. Refer to the Quantization Options section for more details.

    # Example command with $batchsize
    # VAISW_SNAPSHOT_DIRECTORY=snapshot.ssdresnet34 python3 demo_tf2.py ../../samples/samples/ssd/images $batchsize 10
    $ VAISW_SNAPSHOT_DIRECTORY=snapshot.ssd_resnet34.b4.0113 python3 demo_tf2.py ../../samples/samples/ssd/images 4 10
    

    The following text displays the last few lines of the output for SSD-ResNet34 snapshot generation.

    [VAISW]
    [VAISW]    7 batches of 1 sample
    [VAISW]    1 input per batch (1x1200x1200x3)
    [VAISW]    2 output per batchs (1x81x15130, 1x4x15130)
    [VAISW]    2 total subgraphs:
    [VAISW]            1 VAISW (FPGA) subgraph: 99.99% of total MACs (218.38 G)
    [VAISW]                    precision: FX8
    [VAISW]            1 Framework (CPU) subgraph
    [VAISW]    [INFO]:  snapshot directory dumped in snapshot.ssd_resnet34.0113
    [VAISW]    [INFO]:  snapshot dumped for VE2802_NPU_IP_O00_A304_M3
    [VAISW]    7 samples
    [VAISW]    from 01/14/2026 01:06:55 to 01/14/2026 01:11:16
    

    As indicated by the message on the terminal, you need to use the VE2802_NPU_IP_O00_A304_M3 SD card to deploy the snapshot of the SSD_ResNet34 model.

  5. Copy the snapshot from the host machine to the target board. Ensure that the board is up and running:

    $ scp -r $VITIS_AI_REPO-AI/examples/python_examples/ssdResnet34/snapshot.ssd_resnet34.1007 root@<vek280_board_ip>:/root
    # Use the IP address of the VEK280 board in the previous command
    
  6. After copying the snapshot to the target board, you can deploy it using the NPU runner Python application, as explained in X+ML

Generate Snapshot for YOLOX#

The Docker container includes several demo models in the /home/demo/ directory. The following steps show how to generate a snapshot for the YOLOX-m model:

  1. Navigate to the Vitis-AI directory:

    $ cd $VITIS_AI_REPO
    
  2. Enable the NPU software stack:

    $ source npu_ip/settings.sh
    
  3. Navigate to the YOLOX folder:

    $ cd /home/demo/YOLOX
    
  4. Generate snapshot for YOLOX:

    $ VAISW_SNAPSHOT_DIRECTORY=snapshot.yolox.0113 VAISW_QUANTIZATION_NBIMAGES=1 ./run assets/dog.jpg m --save_result
    

    The following text displays the last few lines of the output for YOLOX-m snapshot generation.

    [VAISW]
    [VAISW]    1 batch of 1 sample
    [VAISW]    1 input per batch (1x3x640x640)
    [VAISW]    1 output per batch (1x8400x85)
    [VAISW]    2 total subgraphs:
    [VAISW]            1 VAISW (FPGA) subgraph: 99.99% of total MACs (37.15 G)
    [VAISW]                    precision: FX8
    [VAISW]            1 Framework (CPU) subgraph
    [VAISW]    [INFO]:  snapshot directory dumped in snapshot.yolox.0113
    [VAISW]    [INFO]:  snapshot dumped for VE2802_NPU_IP_O00_A304_M3
    [VAISW]    1 sample
    [VAISW]    from 01/14/2026 01:25:06 to 01/14/2026 01:27:32
    

    Note

    You can control the number of images for quantization tuning as shown in the following command. Refer to the Quantization Options section for more details.

    $ VAISW_SNAPSHOT_DIRECTORY=snapshot.yolox.b4.0113 VAISW_QUANTIZATION_NBIMAGES=4 ./run assets/ m --save_result
    

    As indicated by the message on the terminal, you need to use the VE2802_NPU_IP_O00_A304_M3 SD card to deploy the snapshot of the YOLOX-m model.

  5. Copy the snapshot from the host machine to the target board. Ensure that the board is up and running:

    $ scp -r <path_snapshot_dir>/snapshot.yolox.0923 root@<vek280_board_ip>:/root
    # Use the IP address of the VEK280 board in the previous command
    
  6. After copying the snapshot to the target board, you can deploy it using the NPU runner Python application, as explained in X+ML

    Note

    The YOLOX model is downloaded from official github.com/Megvii-BaseDetection/YOLOX/releases/download.

Generate Snapshot for YOLOv5 with UINT8 Option#

The NPU software stack accepts the input buffer in UINT8 format, which avoids the quantization operation and improves the performance execution on the board. These steps explain how to compile and deploy the YOLOv5 model with the UINT8 mode.

Note

YOLOv5 model is provided in the /home/demo/ directory in the Docker container.

  1. On the Linux host machine, run the following commands to generate a snapshot for the YOLOv5 model with the UINT8 option.

    $ cd $VITIS_AI_REPO
    $ source npu_ip/settings.sh
    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh && source npu_ip/uint8.env && cd /home/demo/yolov5 && VAISW_SNAPSHOT_DIRECTORY=$PWD/SNAP.$NPU_IP/yolov5.b1.uint8 VAISW_USE_UINT_INPUT=1 VAISW_QUANTIZATION_NBIMAGES=1 ./run data/images/bus.jpg --out_file /dev/null --ext pt"
    

    The command generates the yolov5.b1.uint8 snapshot in SNAP.VE2802_NPU_IP_O00_A304_M3 folder, with the UINT8 mode. Use the VAISW_USE_UINT_INPUT=1 option in the npu_ip/uint8.env file to enable it.

  2. Ensure that the VEK280 board is up and running.

  3. Copy the generated snapshot (yolov5.b1.uint8) from the Linux host machine to /home/root/ on the target board.

  4. Copy the yolov5 directory (from /home/demo/ in the Docker) to /home/root/ on the target board.

  5. On the VEK280 target board, run the following commands to execute the YOLOv5 model with the UINT8 option:

    $ source /etc/vai.sh
    $ cd /root/yolov5
    $ VAISW_SNAPSHOT_DIRECTORY=/root/yolov5.b1.uint8/ VAISW_USE_UINT_INPUT=1 ./run /root/yolov5/data/images/bus.jpg --out_file /dev/null --ext pt
    

    The following are the results on executing the command:

    root@xilinx-vek280-20252:~/yolov5# VAISW_SNAPSHOT_DIRECTORY=/root/yolov5.b1.uint8/ VAISW_USE_UINT_INPUT=1 ./run /root/yolov5/data/images/bus.jpg --out_file /dev/null --ext pt
    Error in cpuinfo: prctl(PR_SVE_GET_VL) failed
    detect: weights=['weights/yolov5s.pt'], source=/root/yolov5/data/images/bus.jpg, data=data/coco128.yaml, imgsz=640x640, conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, batchSize=1, out_file=/dev/null, loop=False, keepClasses=None
    YOLOv5 ? v6.1-277-gfdc9d919 Python-3.12.11 torch-2.5.0 CPU
    
    Fusing layers...
    YOLOv5s_v6 summary: 213 layers, 7225885 parameters, 0 gradients
    640x640 4 persons, 1 bus, Done. (2.561s)
    640x640 4 persons, 1 bus, Done. (2.561s)
    No more input, encoding capture...
    root@xilinx-vek280-20252:~/yolov5#
    

    In the above results, the message “prctl(PR_SVE_GET_VL) failed” can be ignored.

  6. You can ignore this step if running the previous command does not cause errors. If you encounter any errors, re-run the command after installing the Python packages as follows:

    # Install following Python packages if there are errors with execution of YOLOv5 model with UINT8 mode.
    $ python3 -m pip install matplotlib==3.7.2 numpy==1.26.4 onnx==1.17.0 onnxruntime==1.18.1 opencv-python==4.10.0.84 pandas==2.0.3 pycocotools==2.0.8 pyyaml scikit-learn==1.3.0 scipy==1.15.2 seaborn==0.13.2 tensorflow==2.19.0 torch==2.5.0 torchvision==0.20.0 tqdm==4.67.1
    $ pip3 uninstall python-dateutil -y
    $ pip3 install --upgrade python-dateutil
    

    Note

    1. It takes a few minutes to install the python packages.

    2. The YOLOv5 model is downloaded from official github.com/ultralytics/yolov5/releases/download

Accelerate YOLO Tails on AIE#

The AIE can fully accelerate YOLO tail graphs, resulting in no CPU sub-graph. The tail (the part after the last convolution) is accelerated inside the AIE for YOLOv5, YOLOv7 and YOLOX models.

The tails of YOLO like models are automatically accelerated on AIE with following notes:

  • Precision of the of the ‘tail’ part is not INT8 (so if BF16 or MIXED precision is used)

    • It is because the tail operations required much higher precision range than the other operations.

    • When using INT8 precision, the tail computation on AIE will be wrong, so the compilation SW stack maps those operations on a CPU sub-graph

  • If the tail contains supported accelerated layers

    • For example, softMax layers is not accelerated on AIE, YOLOv8 has a softMax and therefore it can’t be accelerated on AIE.

Refer to the following steps to generate snapshot for YOLOv5 model and execute it on the board.

Step1: Generate Snapshot for YOLOv5#

  • On Linux host machine, navigate to the Vitis-AI directory:

    $ cd <path_to_Vitis-AI_folder>
    
  • Run the following commands to set up the Vitis AI software environment.

    $ source npu_ip/settings.sh VE2802_NPU_IP_O00_A304_M3
    
  • Generate the snapshot for YOLOv5 using the following command:

    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh && cd /home/demo/yolov5 && VAISW_FE_PRECISION=MIXED VAISW_FE_VIEWDTYPEOUTPUT=AUTO VAISW_SNAPSHOT_DIRECTORY=$PWD/SNAP.$NPU_IP/yolo5.MP.FP32 VAISW_QUANTIZATION_NBIMAGES=1 ./run data/images/bus.jpg --out_file /dev/null --ext pt"
    

    This step generates the yolo5.MP.FP32 snapshot in SNAP.VE2802_NPU_IP_O00_A304_M3 folder for YOLOv5 model.

    Similarly, you can generate the snapshot for YOLOX using the following command:

    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh && cd /home/demo/YOLOX  && VAISW_FE_PRECISION=MIXED VAISW_FE_VIEWDTYPEOUTPUT=AUTO VAISW_SNAPSHOT_DIRECTORY=$PWD/SNAP.$NPU_IP/YOLOX.MP.FP32 VAISW_QUANTIZATION_NBIMAGES=1 ./run assets/dog.jpg m --save_result"
    

    This step generates the YOLOX.MP.FP32 snapshot in SNAP.VE2802_NPU_IP_O00_A304_M3 folder for YOLOX-m model.

Step2: Execute YOLOv5 on Board#

  • Flash SD card with VE2802_NPU_IP_O00_A304_M3__YOLO_X_sd_card.img.gz image. Refer Set Up/Flash SD Card section for instructions to flash it.

  • Ensure that the target board (VEK280) is set up and running. Refer Target Board Setup section for instructions to setup the board.

  • Copy the yolo5.MP.FP32 to the target board.

  • Set up the Vitis AI tools environment on the board:

    $ cd /root
    $ source /etc/vai.sh
    
  • Run the vart_ml_runner.py application to execute the YOLOv5 snapshot on board.

    $ vart_ml_runner.py --snapshot yolo5.MP.FP32/ --in_zero_copy --out_zero_copy
    

    The previous command runs the model with random input and verifies that the snapshot is executed on the target board, with following logs on console.

    root@xilinx-vek280-20252:~# vart_ml_runner.py --snapshot yolo5.MP.FP32/ --in_zero_copy --out_zero_copy
    XAIEFAL: INFO: Resource group Avail is created.
    XAIEFAL: INFO: Resource group Static is created.
    XAIEFAL: INFO: Resource group Generic is created.
    [VART] Allocated config area in DDR:    Addr = [    0x880000000,  0x50000000000,  0x60000000000 ]       Size = [   0x9cd9a1,   0x822371,   0x9066e1]
    [VART] Allocated tmp area in DDR:       Addr = [    0x8809cf000,  0x50080000000,  0x60080000000 ]       Size = [   0xaca801,          0,          0]
    [VART] Found snapshot for IP VE2802_NPU_IP_O00_A304_M3 matching running device VE2802_NPU_IP_O00_A304_M3
    [VART] Parsing snapshot yolo5.MP.FP32//
    [========================= 100% =========================]
    Inference took 3.526 ms
    Inference took 3.454 ms
    Inference took 3.448 ms
    Inference took 3.475 ms
    Inference took 3.445 ms
    Inference took 3.449 ms
    Inference took 3.441 ms
    Inference took 3.434 ms
    Inference took 3.442 ms
    Inference took 3.437 ms
    OK: no error found
    root@xilinx-vek280-20252:~#
    

Input packed format for 3 maps or 1 maps#

On AIEML, it is possible to have the NPU reading natively data in 3 maps (or 1 map) without padding.

The native DDR format for the NPU to perform the convolution is to store the maps on a multiple of 4 maps (or 8 maps) in NHWC shape. However, in case the embedded application can’t store in DDR the input data in this format, it is possible to do the conversion (so adding the padding bytes) by the AIE.

Note

  1. Since adding this operation affects the AIE performance, this operation is not enabled by default. The recommended way to get the best performance is to store the data in DDR with padding to channel on 4 maps.

This can be enabled during the snapshot generation using the VAISW_FE_PACKEDINPUT=true option.

Here are command line examples to build Resnet50 with normal model, packed version and pre-process included in the AIE for PyTorch and Resnet50.

$ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh && cd examples/python_examples/batcher &&                           VAISW_SNAPSHOT_DIRECTORY=$VAISW_HOME/SNAP.$NPU_IP/resnet50.PT              ./run_classification.sh -f pytorch -n resnet50 --batchSize 19"
$ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh && cd examples/python_examples/batcher && VAISW_FE_PACKEDINPUT=true VAISW_SNAPSHOT_DIRECTORY=$VAISW_HOME/SNAP.$NPU_IP/resnet50.PT.packed       ./run_classification.sh -f pytorch -n resnet50 --batchSize 19"
$ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh && cd examples/python_examples/batcher && VAISW_FE_PACKEDINPUT=true VAISW_SNAPSHOT_DIRECTORY=$VAISW_HOME/SNAP.$NPU_IP/resnet50.PT.packed.UINT8 VAISW_RUNSESSION_PREPROCESSTRANSFORMS=\"{'input.1':['Cast(\'uint8\',\'float32\')','StdNorm([123.675,116.28,103.53],[58.395,57.12,57.375])']}\" ./run_classification.sh -f pytorch -n resnet50 --batchSize 19"

Here are command line examples to build Resnet50 with normal model, packed version and pre-process included in the AIE for TensorFlow and Resnet50.

$ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh && cd examples/python_examples/batcher &&                           VAISW_SNAPSHOT_DIRECTORY=$VAISW_HOME/SNAP.$NPU_IP/resnet50.TF2         ./run_classification.sh                -f tensorflow2 -n resnet50 --batchSize 19"
$ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh && cd examples/python_examples/batcher && VAISW_FE_PACKEDINPUT=true VAISW_SNAPSHOT_DIRECTORY=$VAISW_HOME/SNAP.$NPU_IP/resnet50.TF2.packed  ./run_classification.sh                -f tensorflow2 -n resnet50 --batchSize 19"
$ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh && cd examples/python_examples/batcher &&                           VAISW_SNAPSHOT_DIRECTORY=$VAISW_HOME/SNAP.$NPU_IP/resnet50.TF2.UINT8        VAISW_RUNSESSION_PREPROCESSTRANSFORMS=\"{'*':['Cast(\'uint8\',\'float32\')','StdNorm(0,255)']}\" ./run_classification.sh --noBestParams -f tensorflow2 -n resnet50 --batchSize 19"
$ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh && cd examples/python_examples/batcher && VAISW_FE_PACKEDINPUT=true VAISW_SNAPSHOT_DIRECTORY=$VAISW_HOME/SNAP.$NPU_IP/resnet50.TF2.packed.UINT8 VAISW_RUNSESSION_PREPROCESSTRANSFORMS=\"{'*':['Cast(\'uint8\',\'float32\')','StdNorm(0,255)']}\" ./run_classification.sh --noBestParams -f tensorflow2 -n resnet50 --batchSize 19"

Refer following command to build YOLOX with packed version and pre-process included in the AIE for TensorFlow.

$ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh && cd /home/demo/YOLOX &&  VAISW_RUNSESSION_PREPROCESSTRANSFORMS=\"{'onnx::Slice_0':['Cast(\'uint8\',\'float32\')']}\" VAISW_FE_PACKEDINPUT=true VAISW_SNAPSHOT_DIRECTORY=$PWD/snapshot_yolox.int8.NHWC.packed/ VAISW_QUANTIZATION_NBIMAGES=1 ./run assets/dog.jpg --save_result"

Following section covers how to execute the snapshots generated with above commands, using X+ML Application.

Support for RGB/BGR format in X+ML Application

This feature targets use cases where user wants to bypass the preprocessing IP (image processing) and directly feed input data in its native format to the NPU. Users can also control whether the input data is already normalized. If the input is not normalized, the snapshot can be generated such that the NPU performs normalization internally before inference. Normalization parameters, such as mean and scale values, can be specified during snapshot generation. You can refer Preprocess Transforms section for a detailed explanation of how to provide normalization parameters while generating the snapshot.

Refer previous section which covers snapshot generation for ResNet50 and YOLOX models, with packed and pre-processing included in the AIE for TensorFlow and Resnet50/YOLOX. You can use x_plus_ml_app to execute the snapshots generated for ResNet50 and YOLOX models on target board.

# Execute ResNet50 model.
# Pre-requisite to execute ResNet50 model.
# resnet50.TF2.packed.UINT8* snapshot
# Prepare input RGB video file (test_240x240_rgb.raw)
# Now, execute the model
$ ./x_plus_ml_app -i ./test_240x240_rgb.raw -c ./resnet50_no_preprocess.json -s resnet50.TF2.packed.UINT8 -l 3 -a 1 -d 224x224


# Execute YOLOX model.
# Pre-requisite to execute YOLOX model.
# snapshot_yolox.int8.NHWC.packed* snapshot
# Prepare input BGR video file (test_640x640_bgr.raw)
# Now, execute the model
$ ./x_plus_ml_app -i ./test_640x640_bgr.raw -c ./yolox_no_preprocess.json -s snapshot_yolox.int8.NHWC.packed/ -l 3 -a 1 -d 640x640
$ python3 /usr/bin/yolox_postprocess.py --pred_data /tmp/app_npu_output0_0_0_snap_0.bin --image test_640x640.jpg

Refer following example json files (resnet50_no_preprocess.json, yolox_no_preprocess.json) used in above commands.

# resnet50_no_preprocess.json
  {
     "xclbin-location": "/run/media/mmcblk0p1/x_plus_ml.xclbin",
     "use-native-output-format" : 2,
     "_comment_use-native-output-format": "0: Non-Native format, 1: Native format without zero copy, 2: Native format with zero copy",
     "exec-cpu-subgraph" : false,
     "_comment_exec-cpu-subgraph": "true: CPU subgraph is executed on CPU, false: CPU subgraph is not executed",
     "input-config" : {
        "mem-banks" : [2],
        "in-format" : "RGB"
     },
     "postprocess-config": {
        "mem-banks" : [1, 2, 3],
        "_comment_mem-banks": "mem banks on which memory for postprocess input will be allocated",
        "topk" : 1,
        "label-file-path" : "/etc/vai/labels/resnet50_labels.txt",
        "type" : "RESNET50"
     }
  }

# yolox_no_preprocess.json
  {
     "xclbin-location": "/run/media/mmcblk0p1/x_plus_ml.xclbin",
     "exec-cpu-subgraph" : true,
     "_comment_exec-cpu-subgraph": "true: CPU subgraph is executed on CPU, false: CPU subgraph is not executed",
     "input-config": {
           "mem-banks" : [2],
           "in-format" : "BGR"
     }
  }

Execution of INT8, BF16 and Mixed Precision Snapshots with X+ML Application#

The End-to-End application (a.k.a X+ML Application) supports execution of INT8, BF16 and Mixed Precision snapshots generated for ResNet50, YOLOX-m and SSD-ResNet34 models.

Refer following commands to generate snapshot for INT8, BF16 and Mixed Precision data types.

  • Generate snapshots for ResNet50, SSDResnet34 and YOLOX model with INT8 data type.

    $ cd <path_to_Vitis-AI_source>/Vitis-AI/
    # Resnet50
    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh VE2802_NPU_IP_O00_A304_M3 && cd examples/python_examples/batcher && VAISW_SNAPSHOT_DIRECTORY=$PWD/snapshot.resnet50_VE2802_NPU_IP_O00_A304_M3.INT8 ./run_classification.sh -f tensorflow2 -n resnet50 --batchSizePerCore 1"
    # SSDResnet34
    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh VE2802_NPU_IP_O00_A304_M3 && cd examples/python_examples/ssdResnet34 && VAISW_SNAPSHOT_DIRECTORY=$PWD/snapshot.ssdresnet34_VE2802_NPU_IP_O00_A304_M3.INT8 make"
    # YOLOX
    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh VE2802_NPU_IP_O00_A304_M3 && cd /home/demo/YOLOX  && VAISW_SNAPSHOT_DIRECTORY=$PWD/snapshot.yolox_VE2802_NPU_IP_O00_A304_M3.INT8 VAISW_QUANTIZATION_NBIMAGES=1 ./run assets/dog.jpg m --save_result"
    

    Copy the generated snapshots to target board for execution.

    $ scp snapshot.resnet50_VE2802_NPU_IP_O00_A304_M3.INT8 root@vek280_board_ip:/root
    $ scp snapshot.ssdresnet34_VE2802_NPU_IP_O00_A304_M3.INT8 root@vek280_board_ip:/root
    $ scp snapshot.yolox_VE2802_NPU_IP_O00_A304_M3.INT8 root@vek280_board_ip:/root
    
  • Generate snapshots for ResNet50, SSDResnet34 and YOLOX model with BF16 data type.

    $ cd <path_to_Vitis-AI_source>/Vitis-AI/
    # Resnet50
    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh VE2802_NPU_IP_O00_A304_M3 && cd examples/python_examples/batcher && VAISW_FE_PRECISION=BF16 VAISW_FE_VIEWDTYPEOUTPUT=AUTO VAISW_SNAPSHOT_DIRECTORY=$PWD/snapshot.resnet50_VE2802_NPU_IP_O00_A304_M3.BF16.FP32_OUT ./run_classification.sh -f tensorflow2 -n resnet50 --batchSizePerCore 1"
    # SSDResnet34
    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh VE2802_NPU_IP_O00_A304_M3 && cd examples/python_examples/ssdResnet34 && VAISW_FE_PRECISION=BF16 VAISW_FE_VIEWDTYPEOUTPUT=AUTO VAISW_SNAPSHOT_DIRECTORY=$PWD/snapshot.ssdresnet34_VE2802_NPU_IP_O00_A304_M3.BF16.FP32_OUT make"
    # YOLOX
    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh VE2802_NPU_IP_O00_A304_M3 && cd /home/demo/YOLOX  && VAISW_FE_PRECISION=BF16 VAISW_FE_VIEWDTYPEOUTPUT=AUTO VAISW_SNAPSHOT_DIRECTORY=$PWD/snapshot.yolox_VE2802_NPU_IP_O00_A304_M3.BF16.FP32_OUT VAISW_QUANTIZATION_NBIMAGES=1 ./run assets/dog.jpg m --save_result"
    

    Copy the generated snapshots to target board for execution.

    $ scp snapshot.resnet50_VE2802_NPU_IP_O00_A304_M3.BF16.FP32_OUT root@vek280_board_ip:/root
    $ scp snapshot.ssdresnet34_VE2802_NPU_IP_O00_A304_M3.BF16.FP32_OUT root@vek280_board_ip:/root
    $ scp snapshot.yolox_VE2802_NPU_IP_O00_A304_M3.BF16.FP32_OUT root@vek280_board_ip:/root
    
  • Generate snapshots for ResNet50, SSDResnet34 and YOLOX model with mixed precision feature.

    $ cd <path_to_Vitis-AI_source>/Vitis-AI/
    # Resnet50
    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh VE2802_NPU_IP_O00_A304_M3 && cd examples/python_examples/batcher && VAISW_FE_PRECISION=MIXED VAISW_FE_VIEWDTYPEOUTPUT=AUTO VAISW_SNAPSHOT_DIRECTORY=$PWD/snapshot.resnet50_VE2802_NPU_IP_O00_A304_M3.MIXED.FP32_OUT ./run_classification.sh -f tensorflow2 -n resnet50 --batchSizePerCore 1"
    # SSDResnet34
    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh VE2802_NPU_IP_O00_A304_M3 && cd examples/python_examples/ssdResnet34 && VAISW_FE_PRECISION=MIXED VAISW_FE_VIEWDTYPEOUTPUT=AUTO VAISW_SNAPSHOT_DIRECTORY=$PWD/snapshot.ssdresnet34_VE2802_NPU_IP_O00_A304_M3.MIXED.FP32_OUT make"
    # YOLOX
    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh VE2802_NPU_IP_O00_A304_M3 && cd /home/demo/YOLOX  && VAISW_FE_PRECISION=MIXED VAISW_FE_VIEWDTYPEOUTPUT=AUTO VAISW_SNAPSHOT_DIRECTORY=$PWD/snapshot.yolox_VE2802_NPU_IP_O00_A304_M3.MIXED.FP32_OUT VAISW_QUANTIZATION_NBIMAGES=1 ./run assets/dog.jpg m --save_result"
    

    Copy the generated snapshots to target board for execution.

    $ scp snapshot.resnet50_VE2802_NPU_IP_O00_A304_M3.MIXED.FP32_OUT root@vek280_board_ip:/root
    $ scp snapshot.ssdresnet34_VE2802_NPU_IP_O00_A304_M3.MIXED.FP32_OUT root@vek280_board_ip:/root
    $ scp snapshot.yolox_VE2802_NPU_IP_O00_A304_M3.MIXED.FP32_OUT root@vek280_board_ip:/root
    
  • Execute snapshots generated with INT8, BF16 and Mixed Precision data types for ResNet50.

    # Requirements: Copy dog.jpg from docker YOLOX example
    # INT8
    $ x_plus_ml_app -i dog.jpg -s snapshots/snapshot.resnet50_VE2802_NPU_IP_O00_A304_M3.INT8/ -c /etc/vai/json-config/resnet50.json -o output_resnet50_int8.bgr -l 3
    
    # BF16
    $ x_plus_ml_app -i dog.jpg -s snapshots/snapshot.resnet50_VE2802_NPU_IP_O00_A304_M3.BF16.FP32_OUT/ -c /etc/vai/json-config/resnet50_bf16.json -o output_resnet50_bf16.bgr -l 3
    
    # Mixed Precision
    $ x_plus_ml_app -i dog.jpg -s snapshots/snapshot.resnet50_VE2802_NPU_IP_O00_A304_M3.MIXED.FP32_OUT/ -c /etc/vai/json-config/resnet50.json -o output_resnet50_mixed.bgr -l 3
    
  • Execute snapshots generated with INT8, BF16 and Mixed Precision data types for YOLOX.

    # Requirements: Copy dog.jpg from docker YOLOX example
    # INT 8
    $ x_plus_ml_app -i dog.jpg -s snapshots/snapshot.yolox_VE2802_NPU_IP_O00_A304_M3.INT8/ -c /etc/vai/json-config/yolox.json -l 3 -a 1
    
    # BF16
    $ x_plus_ml_app -i dog.jpg -s snapshots/snapshot.yolox_VE2802_NPU_IP_O00_A304_M3.BF16.FP32_OUT/ -c /etc/vai/json-config/yolox_bf16.json -l 3 -a 1
    
    # Mixed Precision
    $  x_plus_ml_app -i dog.jpg -s snapshots/snapshot.yolox_VE2802_NPU_IP_O00_A304_M3.MIXED.FP32_OUT/ -c /etc/vai/json-config/yolox.json -l 3 -a 1
    
    # Post Processing:-
    $ pip3 install torch==2.9.1
    $ pip3 install torchvision==0.24.1
    $ pip3 install onnx
    $ pip3 install onnxruntime==1.20.1
    $ python3 /usr/bin/yolox_postprocess.py --pred_data /tmp/app_npu_output0_0_1_snap_0.bin --image dog.jpg
    
    # Verification snapshot
    $ python3 /usr/bin/yolox_npu_runner.py --snapshot snapshots/snapshot.yolox-m_VE2802_NPU_IP_O00_A304_M3.BF16.FP32_OUT/ --image dog.jpg --dump_output
    $ python3 /usr/bin/yolox_postprocess.py --pred_data /tmp/yolox_output0_0.raw --image dog.jpg
    
  • Execute snapshots generated with INT8, BF16 and Mixed Precision data types for SSDResnet34.

    # Requirements: Copy dog.jpg from docker YOLOX example
    # INT 8
    $ x_plus_ml_app -i dog.jpg -s snapshots/snapshot.ssdresnet34_VE2802_NPU_IP_O00_A304_M3.INT8/ -c /etc/vai/json-config/ssdresnet34_sw.json -o output_ssdresnet34_int8.bgr -l 3
    
    # BF16
    $ x_plus_ml_app -i dog.jpg -s snapshots/snapshot.ssdresnet34_VE2802_NPU_IP_O00_A304_M3.BF16.FP32_OUT/ -c /etc/vai/json-config/ssdresnet34_sw_bf16.json -o output_ssdresnet34_bf16.bgr -l 3
    
    # Mixed Precision
    $ x_plus_ml_app -i dog.jpg -s snapshots/snapshot.ssdresnet34_VE2802_NPU_IP_O00_A304_M3.MIXED.FP32_OUT/ -c /etc/vai/json-config/ssdresnet34_sw.json -o output_ssdresnet34_mixed.bgr -l 3
    
    # Verification snapshot
    $ export VAISW_SNAPSHOT_DIRECTORY=snapshot.ssdresnet34_VE2802_NPU_IP_O00_A304_M3.INT8/
    $ python3 -m pip install Pillow pycocotools
    $ python3 -m pip install pillow opencv-python numpy
    $ python3 -m pip install "numpy<2" pillow opencv-python
    $ python3 demo_vart.py image_folder
    
  • Sample JSON files: All required JSON files are available on the board at the given path /etc/vai/json-config/. For example, below shown the content of resnet50 JSON file used for INT8 and Mixed Precession snapshots.

    {
    "xclbin-location": "/run/media/mmcblk0p1/x_plus_ml.xclbin",
    "use-native-output-format" : 2,
    "_comment_use-native-output-format": "0: Non-Native format, 1: Native format without zero copy, 2: Native format with zero copy",
    "exec-cpu-subgraph" : false,
    "_comment_exec-cpu-subgraph": "true: CPU subgraph is executed on CPU, false: CPU subgraph is not executed",
    "preprocess-config": {
       "mean-r": 0,
       "mean-g": 0,
       "mean-b": 0,
       "scale-r": 0.0039215,
       "scale-g": 0.0039215,
       "scale-b": 0.0039215,
       "colour-format" : "RGBX",
       "maintain-aspect-ratio" : true,
       "resizing-type" : "PANSCAN",
       "in-mem-bank" : 2,
       "out-mem-banks" : [1, 2, 3],
       "_comment_out-mem-banks": "mem banks on which memory for postprocess output will be allocated"
    },
    "postprocess-config": {
          "mem-banks" : [1, 2, 3],
          "_comment_mem-banks": "mem banks on which memory for postprocess input will be allocated",
          "topk" : 1,
          "label-file-path" : "/etc/vai/labels/resnet50_labels.txt",
          "type" : "RESNET50"
    },
    "metaconvert-config":{
       "display-level": -1,
       "font-size" : 0.5,
       "font" : 3,
       "thickness" : 2,
       "radius": 5,
       "mask-level" : 0,
       "y-offset" : 0,
       "draw-above-bbox-flag" : true,
       "label-filter" : [ "class"],
       "label-color" : [
          {"level": 1, "red" : 0, "green" : 255, "blue" : 0 },
          {"level": 2, "red" : 0, "green" : 255, "blue" : 0 },
          {"level": 3, "red" : 255, "green" : 0, "blue" : 0 }
       ],
       "classes" : [
       ]
    }
    }
    

    You can use same resnet50.json except the preprocess::colour-format is changed to RGBX_BF16, for BF16 data type.