Execute Sample Model#
This section helps you to quickly execute the sample model (ResNet50) on the VEK280 board using the pre-built materials provided in the release.
The following flow chart shows the steps needed to run the pre-built binaries:
The SD card image includes the following example applications. You can use any one of them to execute the model and generate inference results.
VART Runner Application
End-to-End (X+ML) Application
VART Runner Application#
The Vitis AI software stack provides a reference application called VART Runner. It is developed using the VART ML APIs and available in both Python and C++. You can use this application to execute the model or the snapshot and generate inference results on VEK280.
Perform the following steps to run the VART Runner application on the VEK280 board:
Ensure that you have completed the SD Card and target board setups. Refer to Installation for more information.
Insert the SD card into the VEK280 board and power the board on.
Log in with the username
rootand passwordroot.Download the following files on the Linux host machine:
ImageNet Dataset: Download the ImageNet dataset.
# Create a folder for the dataset $ mkdir -p dataset/links # Copy the download script from the Vitis AI source code that was downloaded in the "Download Source Code And Pre-Builts" section $ cp <path_to_Vitis_AI>/Vitis-AI/examples/python_examples/batcher/scripts/download_ILSVRC12.py dataset/ $ cp <path_to_Vitis_AI>/Vitis-AI/examples/python_examples/batcher/links/pictures_urls.txt dataset/links/ $ cp <path_to_Vitis_AI>/Vitis-AI/examples/python_examples/batcher/links/ILSVRC2012_synset_words.txt dataset/links/ $ cd dataset # Run the following command to download the ImageNet dataset and ground truth files which are expected as input to the VART Runner application. $ python3 download_ILSVRC12.py imagenet # Copy "imagenet" them to the board. $ scp -r imagenet/ root@<vek280_board_ip>:/root
Set up the Vitis AI tools environment on the board:
$ source /etc/vai.sh
Execute the VART Runner (Python or C++) application on the board.
6.1. VART Runner Python Application
The VART Runner Python application (called as vart_ml_runner.py) utilizes the Python VART API to run any snapshot model with random input or simulation reference input. And it verifies that the inference operates correctly without timing out.
6.1.1. Usage.
# vart_ml_runner.py [-h] [--snapshot SNAPSHOT] [--real_data] [--npu_only]Mandatory arguments:
--snapshot SNAPSHOT: Path to the snapshot directory.
Options:
--n_stability_test N_STABILITY_TEST: Test stability through comparison of N additional iterations of the same run (default: 0)--npu_only: skip ONNX Subgraphs (default: False).--in_native: enable native mode for inputs (default: False).--in_zero_copy: enable zero copy mode for inputs (default: False).--out_native: enable native mode for outputs (default: False)--out_zero_copy: enable zero copy mode for outputs (default: False)--real_data: Re-use saved inputs and compare to expected output (default: False).--dump_IOs DUMP_IOS: Path to dump runners’ inputs/outputs to (without pre/post processing) (default: )
6.1.2. Run the application for the ResNet50 model (The prebuilt snapshot is available under
/run/media/mmcblk0p1/)$ vart_ml_runner.py --snapshot /run/media/mmcblk0p1/snapshot.$NPU_IP.resnet50.TF/ # The previous command runs the model with random input and verifies that the snapshot is executed on the target board.
The command results indicate whether the execution of the ResNet50 model was successful.
root@xilinx-vek280-20252:~# vart_ml_runner.py --snapshot /run/media/mmcblk0p1/snapshot.$NPU_IP.resnet50.TF/ XAIEFAL: INFO: Resource group Avail is created. XAIEFAL: INFO: Resource group Static is created. XAIEFAL: INFO: Resource group Generic is created. [VART] Allocated config area in DDR: Addr = [ 0x880000000, 0x50000000000, 0x60000000000 ] Size = [ 0xe677d1, 0xa5fa81, 0xe67851] [VART] Allocated tmp area in DDR: Addr = [ 0x880e69000, 0x50000a61000, 0x60000e69000 ] Size = [ 0x158c01, 0x127801, 0x127801] [VART] Found snapshot for IP VE2802_NPU_IP_O00_A304_M3 matching running device VE2802_NPU_IP_O00_A304_M3 [VART] Parsing snapshot /run/media/mmcblk0p1/snapshot.VE2802_NPU_IP_O00_A304_M3.resnet50.TF// [========================= 100% =========================] Inference took 23.14 ms Inference took 30.244 ms Inference took 21.365 ms Inference took 21.548 ms Inference took 21.286 ms Inference took 21.426 ms Inference took 21.159 ms Inference took 21.335 ms Inference took 21.277 ms Inference took 21.303 ms OK: no error found root@xilinx-vek280-20252:~#
6.2. VART Runner C++ Application
The VART Runner C++ application (called vart_ml_demo) is implemented based on the VART C++ APIs. It is a generic application that shows how to use the VART C++ APIs. It includes the pre-processing and post-processing of ResNet50 and top1/top5 computation. However, it might also work with other models with slight modifications.
6.2.1. Usage.
$ vart_ml_demo --imgPath PATH --snapshot PATH --labels PATH [Options]
Mandatory arguments:
--snapshot PATH
Path to the snapshot directory.
Options:
--imgPath PATH
Either a directory or a list of images. If it is a directory, run on the nbImages first images. If it is a list of images, run on them (overrides nbImages). If you do not provide the
--imgPath, the system uses random input.--labels PATH
Path to the file containing labels of results; defaults to
labels.--batchSize BATCHSIZE
Size of a batch of images to process, defaults to the snapshot batch size.
--channelOrder
Expected order of the channels, defaults to BGR.
--goldFile PATH
The path to the file containing the gold results. If you provide none, it does not perform a comparison.
--noGoldOutput
Don’t display gold details of each image.
--mean MEAN
Mean of a pixel (depends on the framework). The default value is 0.
--nbImages NBIMAGES
Number of images to process, defaults to 10 times the snapshot batch size.
--network NETWORK
Network to display.
--resizeType
Type of resize to apply to the input images; defaults to PanScan.
--std STD
The standard deviation (depends on the framework).
--nbThreads nb
Number of threads to use.
--repeat nb
Run
nb(default is 1) times the input images (for profiling with large amount of images).--useExternalQuant
Forces app-level quantization before feed to VART ML and Forces app-level unquantization after retrieval from VART ML.
--dataFormat
Force input and/or output data to be uploaded to/downloaded from VART ML in native format. Possible arguments are: - ‘native’ (in and out) - ‘inNative’ (input only) - ‘outNative’ (output only)
--setNonCacheableInput
By default, skip the copy of input data. Doing so can improve performance assuming that input data is already located in a cacheable memory region. On the other hand, if you set this option and data is NOT stored in a cacheable memory region, it can result in performance degradation.
--setNonCacheableOutput
By default, skip the copy of output data. Doing so can improve performance assuming that output data is already located in a cacheable memory region. On the other hand, if you set this option and data is NOT stored in a cacheable memory region, it can result in performance degradation.
--fpgaArch
Specify the FPGA architecture for native format transformation; defaults to ‘aieml’.
--useOnnxSubgraphs
Execute ONNX nodes of the given model. Defaults to
False.--useSnapshotGold
Use the gold files from the snapshot instead of the images and gold file given; defaults to ‘False’.
--forceInOutDdr
Specify which DDR memories will be used to allocate input and output buffers by passing an ordered column-seperated list of IDs.
--forceInDdr
Specify which DDR memories will be used to allocate input buffers by passing an ordered column-seperated list of IDs.
--forceOutDdr
Specify which DDR memories will be used to allocate output buffers by passing an ordered column-seperated list of IDs.
6.2.2. Run the application for the ResNet50 model (The prebuilt snapshot is available under /run/media/mmcblk0p1/):
$ vart_ml_demo --batchSize 19 --goldFile imagenet/ILSVRC_2012_val_GroundTruth_10p.txt --imgPath imagenet/ILSVRC2012_img_val --nbImages 19 --labels /etc/vai/labels/labels --snapshot /run/media/mmcblk0p1/snapshot.$NPU_IP.resnet50.TF --useExternalQuant 64 --dataFormat native --channelOrder BGRThis command generates output with probability scores and an accuracy summary.
The vart_ml_demo command outputs the probability scores for the classification along with an accuracy summary:
root@xilinx-vek280-20252:~# vart_ml_demo --batchSize 19 --goldFile imagenet/ILSVRC_2012_val_GroundTruth_10p.txt --imgPath imagenet/ILSVRC2012_img_val --nbImages 19 --labels /etc/vai/labels/labels --snapshot /run/media/mmcblk0p1/snapshot.$NPU_IP.resnet50.TF --useExternalQuant 64 --dataFormat native --channelOrder BGR XAIEFAL: INFO: Resource group Avail is created. XAIEFAL: INFO: Resource group Static is created. XAIEFAL: INFO: Resource group Generic is created. [VART] Found snapshot for IP VE2802_NPU_IP_O00_A304_M1 IP on kernel 0. [VART] Allocated config area in DDR: Addr = [ 0x50000000000 ] Size = [ 0x272ff39] [VART] Allocated tmp area in DDR: Addr = [ 0x50002731000 ] Size = [ 0x3a7c01] [VART] Parsing snapshot /run/media/mmcblk0p1/snapshot.VE2802_NPU_IP_O00_A304_M1.resnet50.TF/ [========================= 100% =========================] NPU only mode set. Skipping node resnet50_2_cpu_subgraph_call. Loading images: 100% [VART] Running 1 models 7555.78 imgs/s. (19 images) resnet50 Image 0 (0:0) ILSVRC2012_val_00000001.JPEG resnet50 GOLD - n03982430 pool table, billiard table, snooker table - 1.00000000 resnet50 PRED - n03982430 pool table, billiard table, snooker table - 0.99956965 resnet50 PRED - n03942813 ping-pong ball - 0.00015839 resnet50 PRED - n04336792 stretcher - 0.00005827 resnet50 PRED - n03376595 folding chair - 0.00004538 resnet50 PRED - n02797295 barrow, garden cart, lawn cart, wheelbarrow - 0.00003534 resnet50 ............................................................................................... ............................................................................................... resnet50 resnet50 Image 18 (18:0) ILSVRC2012_val_00000019.JPEG resnet50 GOLD - n03803284 muzzle - 1.00000000 resnet50 PRED - n03803284 muzzle - 0.99872428 resnet50 PRED - n02106662 German shepherd, German shepherd dog, German police dog, alsatian - 0.00091072 resnet50 PRED - n02105162 malinois - 0.00026093 resnet50 PRED - n02091467 Norwegian elkhound, elkhound - 0.00007476 resnet50 PRED - n04192698 shield, buckler - 0.00000372 resnet50 ============================================================ Accuracy Summary: [AMD] [resnet50 TEST top1] 68.42% passed. [AMD] [resnet50 TEST top5] 84.21% passed. [AMD] [resnet50 ALL TESTS] 68.42% passed. [AMD] VART ML runner data format was set to NATIVE. [AMD] Running 1 models 7555.78 imgs/s (19 images) root@xilinx-vek280-20252:~#
Note
The
vart_ml_demoapplication includes software-based pre- and post-processing for the ResNet50 model.For custom models, you need to tailor this application with your own pre- and post-processing implementations.
channelOrder is BGR (by default) as the example model (ResNet50) is trained with BGR format. You need to change it based on your model’s training format.
There is a known issue with
vart_ml_demoapplication in printing the prediction scores.
The VART Runner application employs OpenCV libraries (executed on the APU) for image decoding and pre-processing, which encompasses operations such as resizing, color-space conversion, and normalization. The following section discusses the End-to-End (X+ML) application that leverages X components for accelerated pre- and post-processing, in conjunction with the ML component for inference.
End-to-End (X+ML) Application#
The End-to-End application, also known as the X+ML application (x_plus_ml_app), developed using VART ML and X APIs. It facilitates the complete video analytics pipeline, which includes the following:
File Input
Pre-processing
Inference
Post-processing
Overlay
File Output
This application employs OpenCV libraries for the File I/O, JPEG input decoding, and overlay functions. It leverages X components for accelerated pre- and post-processing and the ML component for inference.
You can use the X+ML application to run the ResNet50 model. It produces inference results and saves them to a file. You can then transfer this file to the host machine, where you can view the inference results with software tools like GStreamer or FFMPEG.
Before attempting to run the X+ML application on the VEK280 board, ensure that you have performed the steps described in the VART Runner Application section and that the board is operational.
Run X+ML Application on VEK280 Board
Usage
$ x_plus_ml_app -i <path_to_input_file> -s <path_to_snapshot> -c <path_to_config_file> [Options]
Mandatory arguments:
-i: Input file path (mandatory).-s: Snapshot path (mandatory).-c: Config file path (mandatory). It is a JSON configuration file that contains information about parameters for pre-processing, post-processing, and metaconvert. Refer to the x_plus_ml_app Configuration File.
Options:
-o: Output file path (optional). If provided, the program dumps inference results overlaid on the frame into this file.-n: Number of frames to process (optional, default is to process all frames).-l: Application log level to print logs (optional, default is ERROR and WARNING). Accepted log levels: 1 for ERROR, 2 for WARNING, 3 for INFERENCE RESULT, 4 for INFO, 5 for DEBUG. Prints the logs at the provided level and the levels below.-d: WidthxHeight of the input (required only in the case of NV12 input; example: 224x224).-r: Dump NPU output, default is false.-h: Print this help and exit.Sample CLIs:Single snapshot execution :- “x_plus_ml_app -i dog.jpg -c /etc/vai/json-config/yolox.json -s snapshot.yolox.0408 -l 3”
Multi snapshots execution :- “x_plus_ml_app -i dog.jpg+dog.jpg -c /etc/vai/json-config/yolox.json+/etc/vai/json-config/yolox.json -s snapshot.yolox.0408+snapshot.yolox.0408 -l 3+3”
Run the X+ML application:
$ cd /root $ source /etc/vai.sh $ x_plus_ml_app -i /root/imagenet/ILSVRC2012_img_val/ILSVRC2012_val_00000001.JPEG -s /run/media/mmcblk0p1/snapshot.$NPU_IP.resnet50.TF -c /etc/vai/json-config/resnet50.json -o output.bgr -l 3
The previous command generates the output file at the path specified by the -o option. If the input file is in JPEG format, the output is in BGR24 format. If the input file is in NV12 format, the output remains in NV12 format. Additionally, the application provides an option to enable inference output by specifying the -l 3 option. The command also displays log messages that include the resolution and format of the output.
The following block shows the results of the x_plus_ml_app command:
root@xilinx-vek280-20252:~# x_plus_ml_app -i /root/imagenet/ILSVRC2012_img_val/ILSVRC2012_val_00000001.JPEG -s /run/media/mmcblk0p1/snapshot.$NPU_IP.resnet50.TF -c /etc/vai/json-config/resnet50.json -o output.bgr -l 3
XAIEFAL: INFO: Resource group Avail is created.
XAIEFAL: INFO: Resource group Static is created.
XAIEFAL: INFO: Resource group Generic is created.
XAIEFAL: INFO: Resource group Avail is created.
XAIEFAL: INFO: Resource group Static is created.
XAIEFAL: INFO: Resource group Generic is created.
[VART] Allocated config area in DDR: Addr = [ 0x880000000, 0x50000000000, 0x60000000000 ] Size = [ 0xe677d1, 0xa5fa81, 0xe67851]
[VART] Allocated tmp area in DDR: Addr = [ 0x880e69000, 0x50000a61000, 0x60000e69000 ] Size = [ 0x158c01, 0x127801, 0x127801]
[VART] Found snapshot for IP VE2802_NPU_IP_O00_A304_M3 matching running device VE2802_NPU_IP_O00_A304_M3
[VART] Parsing snapshot /run/media/mmcblk0p1/snapshot.VE2802_NPU_IP_O00_A304_M3.resnet50.TF/
[========================= 100% =========================]
NPU only mode set. Skipping node resnet50_CPU.
[RESULT] post_process.cpp:166 Results for frame number 1
[RESULT] post_process.cpp:176 Classification Label : pool table, billiard table, snooker table (confidence 0.999908)
Inference time for frame 1 : 2.364 ms
Number of frames processed: 1
Average Inference Time for 1 Frames: 2.364 ms
Output dumped at output.bgr with 160x160 resolution and BGR format
root@xilinx-vek280-20252:~#
Additional reference command which expects NV12 as input and generates NV12 as output.
Download test_samples-vai-6.1.zip file and copy it to board, before running following commands.
# Run following command on host machine
$ scp <path_to_test_samples-vai-6.1.zip> root@<vek280_board_ip>:/root
# Run following commands on board
$ cd /root
$ unzip test_samples-vai-6.1.zip
$ x_plus_ml_app -i /root/test_samples/CLASSIFICATION_224x224.nv12
-s /run/media/mmcblk0p1/snapshot.$NPU_IP.resnet50.TF
-c /etc/vai/json-config/resnet50.json
-d 224x224
-l 3
-o output.nv12
Note
The x_plus_ml_app application implements software-based post-processing for the ResNet50 model. You must customize this application with post-processing implementations for custom models.
Verify the Output of the X+ML Application on the Host Machine
You can view the output results (for example: output.bgr) using FFMPEG or GStreamer commands as follows:
Copy the output of the X+ML application (output.bgr) from the board to the host machine by running the following command on the host machine:
$ scp root@<vek280_board_ip>:/root/output.bgr .
Verify the output with FFmpeg
# Usage: # ffplay -f rawvideo -pixel_format PIXEL_FORMAT -video_size video_widthxvideo_height -i path_to_output_file # PIXEL_FORMAT: bgr24 or nv12, depending on the output generated by the X+ML application. # Command to run: $ ffplay -f rawvideo -pixel_format bgr24 -video_size 160x160 -i output.bgr # The image resolution is 160x160, as indicated by the output of x_plus_ml_app shown in the console. # [Optional] You can also convert the BGR format to JPEG by using the ffmpeg command and view the JPEG results with any media player tool: $ ffmpeg -f rawvideo -pixel_format bgr24 -video_size 160x160 -i output.bgr output.jpeg # Another example command: $ ffplay -f rawvideo -pixel_format nv12 -video_size 160x160 -i output.nv12
The FFMPEG command displays the output file
output.bgras shown in the following image.
Note
The output results might differ based on the input image
Verify the output with GStreamer:
# Usage: # gst-launch-1.0 multifilesrc location=path_to_output_file loop=true ! video/x-raw,format=BGR,height=video_height,width=video_width,framerate=30/1 ! videoconvert ! autovideosink # Command to run $ gst-launch-1.0 multifilesrc location=output.bgr loop=true ! video/x-raw,format=BGR,height=160,width=160,framerate=30/1 ! videoconvert ! autovideosink # The image resolution is 160x160 as per the output of x_plus_ml_app shown on the console # Another example command: $ gst-launch-1.0 multifilesrc location=output.nv12 loop=true ! video/x-raw,format=NV12,height=160,width=160,framerate=30/1 ! videoconvert ! autovideosink
The GStreamer command displays the output file
output.bgras shown in the following image.
Note
The output results may differ based on the input image
Now you successfully run the sample (ResNet50) model on the VEK280 target, and you can refer to the following areas of interest:
- Customization Opportunities
- Performance Analyzer