Compiled Model Output#
After successful compilation, the AMD Vitis™ AI Execution Provider generates compiled model artifacts in the cache_dir/cache_key directory. These files contain all the necessary binaries and metadata required for model execution on the NPU at runtime.
Understanding the Compiled Model#
When you compile a model, the entire directory structure created in cache_dir/cache_key constitutes your compiled model. All files within this directory are required for runtime inference.
What gets created:
Binary files optimized for NPU execution
Model metadata and configuration
Runtime execution graphs
Compilation logs and summary reports
Directory Structure#
After compilation, your cache_dir contains the following structure:
Directory Structure Format (Default):
cache_dir/
cache_key/
vaiml_par_0/
# Compiled model files
final-vaiml-pass-summary.txt # Compilation summary report
[additional compilation artifacts]
Model Storage Formats#
The compiler supports two storage formats for the compiled model:
- Directory Structure Format (Default)
The compiled model is stored in the
cache_dir/cache_keyas a directory hierarchy containing the following:vaiml_par_0/- Partition subdirectory containing NPU-specific binaries and runtime filesfinal-vaiml-pass-summary.txt- Compilation summary with performance metricsModel metadata and intermediate compilation artifacts (useful for debugging)
What to copy to the board: Copy the entire
cache_keydirectory to your target board, maintaining the directory structure.When to use:
During development and debugging
When you need to inspect compilation artifacts
When you want detailed logs and reports
What to copy to the board: Copy only the single
.raifile to your target board.When to use:
For production deployment
When transferring models to target hardware
For large models where single-file management is easier
When you need faster loading times
Compilation Summary Report#
After compilation, you can review detailed compilation results in the summary report.
Viewing the Compilation Summary#
To view compilation details, display the content of the summary file:
cat cache_dir/cache_key/final-vaiml-pass-summary.txt
Understanding the Summary Report#
The compilation summary provides critical information about how your model was optimized:
Example Summary Report:
--------- Final Summary of VAIML Pass ----------
OS: Linux X64
VAIP commit: 744227ab2a0fddec1eccdfe04ca222afd339f53f
Model: /path/to/models/resnet18.a1_in1k.onnx
Model signature: 41d764d4ef1d716a260bc7b2b4e07ff1
Device: ve2
Model data type: float32
Device data type: bfloat16
Number of operators in the model: 49
GOPs of the model: 3.64388
Number of operators supported by VAIML: 49 (100.000%)
GOPs supported by VAIML: 3.644 (100.000%)
Number of subgraphs supported by VAIML: 1
Number of operators offloaded by VAIML: 49 (100.000%)
GOPs offloaded by VAIML: 3.644 (100.000%)
Number of subgraphs offloaded by VAIML: 1
Number of subgraphs with compilation errors (fall back to CPU): 0
Number of subgraphs below 20% GOPs threshold (fall back to CPU): 0
Number of subgraphs above max number of subgraphs allowed(7): 0 (fall back to CPU)
Stats for offloaded subgraphs
Subgraph vaiml_par_0 stats:
Type: npu
Operators: 49 (100.000%)
GOPs : 3.644 (100.000%) OPs: 3,643,881,552
Key Metrics Explained#
Metric |
Description |
|---|---|
Model signature |
Unique hash identifying your model |
Device |
Target NPU device (for example, ve2-xc2ve3858) |
Model data type |
Original model precision (for example, float32) |
Device data type |
NPU execution precision (for example, bfloat16) |
Number of operators |
Total operators in your model |
GOPs |
Giga Operations - computational complexity measure |
Operators supported by VAIML |
Percentage of operators that can run on NPU |
GOPs supported by VAIML |
Percentage of computation that can run on NPU |
Operators offloaded by VAIML |
Operators actually running on NPU (after optimization) |
GOPs offloaded by VAIML |
Computation actually running on NPU (after optimization) |
Number of subgraphs |
Model partitions created for NPU execution |
Subgraph type |
Execution target: |
Interpreting Compilation Results#
Optimal Compilation (100% NPU offload):
Number of operators offloaded by VAIML: 49 (100.000%)
GOPs offloaded by VAIML: 3.644 (100.000%)
Number of subgraphs offloaded by VAIML: 1
This indicates all operations run on the NPU for maximum performance.
Partial NPU Offload:
Number of operators offloaded by VAIML: 35 (71.429%)
GOPs offloaded by VAIML: 2.500 (68.627%)
Number of subgraphs offloaded by VAIML: 2
Number of subgraphs with compilation errors (fall back to CPU): 1
This indicates some operations fall back to CPU execution. Check the reasons:
Compilation errors: Subgraph failed to compile for NPU
Below 20% GOPs threshold: Subgraph too small, CPU execution more efficient
Above max subgraphs (7): Too many partitions, excess falls back to CPU
Details on model deployment on the board can be found in ONNX Runtime Python Inference