Docker Samples and Demos#

This section covers generating snapshots for widely used models by using a Python application named run_classification.sh. It is provided in the Vitis AI repository at $VITIS_AI_REPO/examples/python_examples/batcher. This section also describes generating snapshots for demo models in Docker.

Run the following steps to check the list of models included as an example in the run_classification.sh application:

  1. Navigate to the Vitis-AI directory:

    $ cd $VITIS_AI_REPO
    
  2. Launch Docker:

    $ ./docker/run.bash
    
  3. Navigate to the batcher folder:

    $ cd examples/python_examples/batcher
    
  4. Check the list of supported frameworks:

    $ ./run_classification.sh -f list
    

    This command displays the following output on the console.

    List of supported frameworks:
    
    onnxRuntime, pytorch, tensorflow, tensorflow2
    

    The following table displays the framework versions that are tested on Docker.

    Package     | Tested up to | Docker version
    ------------+--------------+-----------------
    tensorflow  | 2.16.1       | 2.9.0
    onnx        | 1.16.1       | 1.12.1
    onnxruntime | 1.18.0       | 1.12.0
    torch       | 2.3.1        | 1.12.1
    
  5. Check the list of supported models for the PyTorch framework:

    $ ./run_classification.sh -f pytorch -n list
    

    After running the previous command, the following output is displayed on the console:

    List of supported networks for the framework pytorch:
    
    alexnet densenet121 densenet161 densenet169 densenet201 googlenet_no_lrn inceptionv3 mnasnet0_5 mnasnet0_75 mnasnet1_0 mnasnet1_3 mobilenet_v2 resnet101 resnet152 resnet18 resnet34 resnet50 resnext101_32x8d resnext50_32x4d shufflenet_v2_x0_5 shufflenet_v2_x1_0 shufflenet_v2_x1_5 shufflenet_v2_x2_0 squeezenet squeezenet1_1 vgg11 vgg11_bn vgg13 vgg13_bn vgg16 vgg16_bn vgg19 vgg19_bn wide_resnet101_2 wide_resnet50_2
    

    Similarly, you can check the list of supported models for TensorFlow (1 and 2) and ONNX:

    $ ./run_classification.sh -f tensorflow -n list
    $ ./run_classification.sh -f tensorflow2 -n list
    $ ./run_classification.sh -f onnxRuntime -n list
    

Generate Snapshot For ResNet50#

After reviewing all the models supported by the run_classification.sh script, follow these steps to generate a snapshot for the ResNet50 model as an example. Run the steps inside Docker:

  1. Navigate to the Vitis-AI directory:

    $ cd $VITIS_AI_REPO
    
  2. Enable the NPU software stack:

    $ source npu_ip/settings.sh
    
  3. Navigate to the batcher folder:

    $ cd examples/python_examples/batcher
    
  4. Run the following command to generate a snapshot for ResNet50:

    $ VAISW_SNAPSHOT_DIRECTORY=snapshot.resnet50.tf2.b19.1007 ./run_classification.sh -f tensorflow2 -n resnet50 -b 19
    

    This command generates the snapshot in $VITIS_AI_REPO/examples/python_examples/batcher/.

    [VAISW]
    [VAISW]    10 batches of 19 samples (the first batch is not used to compute the detailed times)
    [VAISW]    1 input per batch (19x224x224x3)
    [VAISW]    1 output per batch (19x1001)
    [VAISW]    2 total subgraphs:
    [VAISW]            1 VAISW (FPGA) subgraph: 99.99% of total MACs (79.10 G)
    [VAISW]                    precision: FX8
    [VAISW]            1 Framework (CPU) subgraph
    [VAISW]    [INFO]:  snapshot directory dumped in snapshot.resnet50.tf2.b19.1007
    [VAISW]    [INFO]:  snapshot dumped for VE2802_NPU_IP_O00_A304_M3
    [VAISW]    190 samples
    [VAISW]    from 10/07/2025 15:19:29 to 10/07/2025 15:23:00
    

    After successfully generating the snapshot, the terminal displays the message: snapshot dumped for VE2802_NPU_IP_O00_A304_M3. This message indicates that you must use the corresponding SD Card image created for this specific IP to verify the snapshot. It is essential to build the reference design solution using the same NPU IP. Failing to do so might result in errors when a snapshot generated for one version of the NPU IP is executed on an SD Card image intended for a different NPU IP.

    Note

    1. The previous command takes a few minutes to generate a snapshot.

    2. The accuracy details displayed in the previous table might differ in data.

  5. Copy the snapshot from the host machine to the target board. Ensure that the board is up and running:

    $ scp -r $VITIS_AI_REPO/examples/python_examples/batcher/snapshot.resnet50.tf2.b19.1007 root@<vek280_board_ip>:/root
    # Use the IP address of the VEK280 board in the previous command
    
  6. After transferring the snapshot to the target board, you can deploy it using the NPU runner applications. Refer to Execute Sample Model for more details.

Generate Snapshot for SSD_ResNet34#

Inside the Docker container, run the following commands to generate the snapshot for SSD-ResNet34 with a batch size of one. The following command generates a snapshot in the current directory named snapshot.ssd_resnet34.0207.

  1. Navigate to the Vitis-AI directory:

    $ cd $VITIS_AI_REPO
    
  2. Enable the NPU software stack:

    $ source npu_ip/settings.sh
    
  3. Navigate to the ssdResnet34 folder:

    $ cd examples/python_examples/ssdResnet34
    
  4. Generate a snapshot for SSD ResNet34:

    $ VAISW_SNAPSHOT_DIRECTORY=snapshot.ssd_resnet34.1007 make
    

    Note

    You can generate a snapshot with different batch sizes by executing the following command and replacing $batchsize with your desired batch size number. The batchsize=4 is used in the following command. Refer to the Quantization Options section for more details.

    # Example command with $batchsize
    # VAISW_SNAPSHOT_DIRECTORY=snapshot.ssdresnet34 python3 demo_tf2.py ../../samples/samples/ssd/images $batchsize 10
    $ VAISW_SNAPSHOT_DIRECTORY=snapshot.ssd_resnet34.b4.1007 python3 demo_tf2.py ../../samples/samples/ssd/images 4 10
    

    The following text displays the last few lines of the output for SSD-ResNet34 snapshot generation.

    [VAISW]
    [VAISW]    7 batches of 1 sample (the first batch is not used to compute the detailed times)
    [VAISW]    1 input per batch (1x1200x1200x3)
    [VAISW]    2 output per batchs (1x81x15130, 1x4x15130)
    [VAISW]    2 total subgraphs:
    [VAISW]            1 VAISW (FPGA) subgraph: 99.99% of total MACs (218.38 G)
    [VAISW]                    precision: FX8
    [VAISW]            1 Framework (CPU) subgraph
    [VAISW]    [INFO]:  snapshot directory dumped in snapshot.ssd_resnet34.1007
    [VAISW]    [INFO]:  snapshot dumped for VE2802_NPU_IP_O00_A304_M3
    [VAISW]    7 samples
    [VAISW]    from 10/07/2025 15:29:05 to 10/07/2025 15:34:17
    

    As indicated by the message on the terminal, you need to use the VE2802_NPU_IP_O00_A304_M3 SD card to deploy the snapshot of the SSD_ResNet34 model.

  5. Copy the snapshot from the host machine to the target board. Ensure that the board is up and running:

    $ scp -r $VITIS_AI_REPO-AI/examples/python_examples/ssdResnet34/snapshot.ssd_resnet34.1007 root@<vek280_board_ip>:/root
    # Use the IP address of the VEK280 board in the previous command
    
  6. After copying the snapshot to the target board, you can deploy it using the NPU runner Python application, as explained in Execute Sample Model

Generate Snapshot for YOLOX#

Inside the Docker container, several demo models are provided in the /home/demo/ directory in the Docker container. The following steps show how to generate a snapshot for the YOLOX-m model:

  1. Navigate to the Vitis-AI directory:

    $ cd $VITIS_AI_REPO
    
  2. Enable the NPU software stack:

    $ source npu_ip/settings.sh
    
  3. Navigate to the YOLOX folder:

    $ cd /home/demo/YOLOX
    
  4. Generate snapshot for YOLOX:

    $ VAISW_SNAPSHOT_DIRECTORY=snapshot.yolox.1007 VAISW_QUANTIZATION_NBIMAGES=1 ./run assets/dog.jpg m --save_result
    

    The following text displays the last few lines of the output for YOLOX-m snapshot generation.

    [VAISW]
    [VAISW] The statistic summary can not be displayed, more than 1 inference must be run but 1 inference has been executed.
    [VAISW]    [INFO]:  snapshot directory dumped in snapshot.yolox.1007
    [VAISW]    [INFO]:  snapshot dumped for VE2802_NPU_IP_O00_A304_M3
    

    Note

    You can control the number of images for quantization tuning as shown in the following command. Refer to the Quantization Options section for more details.

    $ VAISW_SNAPSHOT_DIRECTORY=snapshot.yolox.b4.1007 VAISW_QUANTIZATION_NBIMAGES=4 ./run assets/ m --save_result
    

    As indicated by the message on the terminal, you need to use the VE2802_NPU_IP_O00_A304_M3 SD card to deploy the snapshot of the YOLOX-m model.

  5. Copy the snapshot from the host machine to the target board. Ensure that the board is up and running:

    $ scp -r <path_snapshot_dir>/snapshot.yolox.0923 root@<vek280_board_ip>:/root
    # Use the IP address of the VEK280 board in the previous command
    
  6. After copying the snapshot to the target board, you can deploy it using the NPU runner Python application, as explained in Execute Sample Model

    Note

    The YOLOX model is downloaded from official github.com/Megvii-BaseDetection/YOLOX/releases/download.

Generate Snapshot for YOLOv5 with UINT8 Option#

The NPU software stack accepts the input buffer in UINT8 format, which avoids the quantization operation and improves the performance execution on the board. These steps explain how to compile and deploy the YOLOv5 model with the UINT8 mode.

Note

YOLOv5 model is provided in the /home/demo/ directory in the Docker container.

  1. On the Linux host machine, run the following commands to generate a snapshot for the YOLOv5 model with the UINT8 option.

    $ cd $VITIS_AI_REPO
    $ source npu_ip/settings.sh
    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh && source npu_ip/uint8.env && cd /home/demo/yolov5 && VAISW_SNAPSHOT_DIRECTORY=$PWD/SNAP.$NPU_IP/yolov5.b1.uint8 VAISW_USE_UINT_INPUT=1 VAISW_QUANTIZATION_NBIMAGES=1 ./run data/images/bus.jpg --out_file /dev/null --ext pt"
    

    The command generates the yolov5.b1.uint8 snapshot in SNAP.VE2802_NPU_IP_O00_A304_M3 folder, with the UINT8 mode. It is enabled by using the VAISW_USE_UINT_INPUT=1 option in the npu_ip/uint8.env file.

  2. Ensure that the VEK280 board is up and running.

  3. Copy the generated snapshot (yolov5.b1.uint8) from the Linux host machine to /home/root/ on the target board.

  4. Copy the yolov5 directory (from /home/demo/ in the Docker) to /home/root/ on the target board.

  5. On the VEK280 target board, run the following commands to execute the YOLOv5 model with the UINT8 option:

    $ source /etc/vai.sh
    $ cd /root/yolov5
    $ VAISW_SNAPSHOT_DIRECTORY=/root/yolov5.b1.uint8/ VAISW_USE_UINT_INPUT=1 ./run /root/yolov5/data/images/bus.jpg --out_file /dev/null --ext pt
    

    The following are the results on executing the command:

    root@xilinx-vek280-xsct-20251:~/yolov5# VAISW_SNAPSHOT_DIRECTORY=/root/yolov5.b1.uint8/ VAISW_USE_UINT_INPUT=1 ./run /root/yolov5/data/images/bus.jpg --out_file /dev/null --ext pt
    Error in cpuinfo: prctl(PR_SVE_GET_VL) failed
    detect: weights=['weights/yolov5s.pt'], source=/root/yolov5/data/images/bus.jpg, data=data/coco128.yaml, imgsz=640x640, conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, batchSize=1, out_file=/dev/null, loop=False, keepClasses=None
    YOLOv5 ? v6.1-277-gfdc9d919 Python-3.12.9 torch-2.5.0 CPU
    
    Fusing layers...
    YOLOv5s_v6 summary: 213 layers, 7225885 parameters, 0 gradients
    640x640 4 persons, 1 bus, Done. (2.537s)
    640x640 4 persons, 1 bus, Done. (2.537s)
    No more input, encoding capture...
    root@xilinx-vek280-xsct-20251:~/yolov5#
    

    In the above results, the message “prctl(PR_SVE_GET_VL) failed” can be ignored.

  6. You can ignore this step if running the previous command does not cause errors. If you encounter any errors, re-run the command after installing the Python packages as follows:

    # Install following Python packages if there are errors with execution of YOLOv5 model with UINT8 mode.
    $ python3 -m pip install matplotlib==3.7.2 numpy==1.26.4 onnx==1.17.0 onnxruntime==1.18.1 opencv-python==4.10.0.84 pandas==2.0.3 pycocotools==2.0.8 pyyaml scikit-learn==1.3.0 scipy==1.15.2 seaborn==0.13.2 tensorflow==2.19.0 torch==2.5.0 torchvision==0.20.0 tqdm==4.67.1
    

    Note

    1. It takes a few minutes to install the python packages.

    2. The YOLOv5 model is downloaded from official github.com/ultralytics/yolov5/releases/download

Accelerate YOLO Tails on AIE#

The YOLO tail graphs can be fully accelerated on the AIE, resulting in no CPU sub-graph. The tail (the part after the last convolution) is accelerated inside the AIE for YOLOv5, YOLOv7 and YOLOX models.

The tails of YOLO like models are automatically accelerated on AIE with following notes:

  • Precision of the of the ‘tail’ part is not INT8 (so if BF16 or MIXED precision is used)

    • It is because the tail operations required much higher precision range than the other operations.

    • When using INT8 precision, the tail computation on AIE will be wrong, so the compilation SW stack maps those operations on a CPU sub-graph

  • If the tail contains supported accelerated layers

    • For example, softMax layers is not accelerated on AIE, YOLOv8 has a softMax and therefore it can’t be accelerated on AIE.

Refer to the following steps to generate snapshot for YOLOv5 model and execute it on the board.

Step1: Generate Snapshot for YOLOv5#

  • On Linux host machine, navigate to the Vitis-AI directory:

    $ cd <path_to_Vitis-AI_folder>
    
  • Run the following commands to set up the Vitis AI software environment.

    $ source npu_ip/settings.sh VE2802_NPU_IP_O00_A304_M3
    
  • Generate the snapshot for YOLOv5 using the following command:

    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh && cd /home/demo/yolov5 && VAISW_FE_PRECISION=MIXED VAISW_FE_VIEWDTYPEOUTPUT=AUTO VAISW_SNAPSHOT_DIRECTORY=$PWD/SNAP.$NPU_IP/yolo5.MP.FP32 VAISW_QUANTIZATION_NBIMAGES=1 ./run data/images/bus.jpg --out_file /dev/null --ext pt"
    

    This step generates the yolo5.MP.FP32 snapshot in SNAP.VE2802_NPU_IP_O00_A304_M3 folder for YOLOv5 model.

    Similarly, you can generate the snapshot for YOLOX using the following command:

    $ ./docker/run.bash --acceptLicense -- /bin/bash -c "source npu_ip/settings.sh && cd /home/demo/YOLOX  && VAISW_FE_PRECISION=MIXED VAISW_FE_VIEWDTYPEOUTPUT=AUTO VAISW_SNAPSHOT_DIRECTORY=$PWD/SNAP.$NPU_IP/YOLOX.MP.FP32 VAISW_QUANTIZATION_NBIMAGES=1 ./run assets/dog.jpg m --save_result"
    

    This step generates the YOLOX.MP.FP32 snapshot in SNAP.VE2802_NPU_IP_O00_A304_M3 folder for YOLOX-m model.

Step2: Execute YOLOv5 on Board#

  • Flash SD card with V5.1_VE2802_NPU_IP_O00_A304_M3_sd_card.img.gz image. Refer Set Up/Flash SD Card section for instructions to flash it.

  • Ensure that target board (VEK280) is set up and running. Refer Target Board Setup section for instructions to setup the board.

  • Copy the yolo5.MP.FP32 to the target board.

  • Set up the Vitis AI tools environment on the board:

    $ source /etc/vai.sh
    
  • Run the vart_ml_runner.py application to execute the YOLOv5 snapshot on board.

    $ vart_ml_runner.py --snapshot yolo5.MP.FP32/ --in_zero_copy --out_zero_copy
    

    The previous command runs the model with random input and verifies that the snapshot is executed on the target board, with following logs on console.

    root@xilinx-vek280-xsct-20251:~# vart_ml_runner.py --snapshot yolo5.MP.FP32/ --in_zero_copy --out_zero_copy
    XAIEFAL: INFO: Resource group Avail is created.
    XAIEFAL: INFO: Resource group Static is created.
    XAIEFAL: INFO: Resource group Generic is created.
    [VART] Allocated config area in DDR:    Addr = [    0x880000000,  0x50000000000,  0x60000000000 ]       Size = [   0x98e211,   0x8383d1,   0x8c8f91]
    [VART] Allocated tmp area in DDR:       Addr = [    0x880990000,  0x50080000000,  0x60080000000 ]       Size = [   0xaca801,          0,          0]
    [VART] Found snapshot for IP VE2802_NPU_IP_O00_A304_M3 matching running device VE2802_NPU_IP_O00_A304_M3
    [VART] Parsing snapshot yolo5.MP.FP32//
    [========================= 100% =========================]
    [VART]
    [VART] Statistics (in ms), 1 sample, batch number   0:
    [VART]  wrp_network
    Inference took 4 ms
    [VART]
    [VART] Statistics (in ms), 1 sample, batch number   1:
    [VART]  wrp_network           : Total   3.99 | AIE   3.51 | CPU sum   0.17
    Inference took 4 ms
    [VART]
    [VART] Statistics (in ms), 1 sample, batch number   2:
    [VART]  wrp_network           : Total   3.97 | AIE   3.49 | CPU sum   0.17
    Inference took 4 ms
    [VART]
                                        .
                                        .
    [VART]
    [VART] Statistics (in ms), 1 sample, batch number   9:
    [VART]  wrp_network           : Total   3.94 | AIE   3.50 | CPU sum   0.18
    Inference took 4 ms
    OK: no error found
    [VART]
    [VART]           board XIL_VEK280_REVB3 (AIE: 304 = 38x8)
    [VART]           10 inferences of batch size 1 (the first inference is not used to compute the detailed times)
    [VART]           1 input layer. Tensor shape: 1x3x640x640 (INT8)
    [VART]           1 output layer. Tensor shape: 1x25200x85 (FLOAT32)
    [VART]           1 total subgraph:
    [VART]                   1 VART (AIE) subgraph
    [VART]                   0 Framework (CPU) subgraph
    [VART]           10 samples
    [VART]
    [VART] "wrp_network" run summary:
    [VART]           detailed times in ms
    [VART] +-----------------------------------+------------+------------+------------+------------+
    [VART] | Performance Summary               |  ms/batch  |  ms/batch  |  ms/batch  |   sample/s |
    [VART] |                                   |    min     |    max     |   median   |   median   |
    [VART] +-----------------------------------+------------+------------+------------+------------+
    [VART] | Whole Graph total                 |       3.94 |       4.00 |       3.98 |     251.45 |
    [VART] |   VART total (   1 sub-graph)     |       3.65 |       3.69 |       3.66 |     272.85 |
    [VART] |     AI acceleration (*)           |       3.49 |       3.52 |       3.50 |     285.71 |
    [VART] |     CPU processing                |       0.16 |       0.18 |       0.16 |            |
    [VART] |       Others                      |            |            |       0.16 |            |
    [VART] |   Others                          |            |            |       0.31 |            |
    [VART] +-----------------------------------+------------+------------+------------+------------+
    [VART] (min and max are measured individually, only the median sums are meaningful).
    [VART] (*) AI Acceleration time includes the transfer to/from the external memories.
    root@xilinx-vek280-xsct-20251:~#
    

As shown in the performance summary table, the YOLOv5 is fully accelerated on AIE when using the VE2802_NPU_IP_O00_A304_M3 IP.

Note

The performance summary is visible when VAISW_RUNSESSION_SUMMARY=all is exported.