Compiled Model Output#

After successful compilation, the AMD Vitis™ AI Execution Provider generates compiled model artifacts in the cache_dir/cache_key directory. These files contain all the necessary binaries and metadata required for model execution on the NPU at runtime.

Understanding the Compiled Model#

When you compile a model, the entire directory structure created in cache_dir/cache_key constitutes your compiled model. All files within this directory are required for runtime inference.

What gets created:

Binary files optimized for NPU execution
Model metadata and configuration
Runtime execution graphs
Compilation logs and summary reports

Directory Structure#

After compilation, your cache_dir contains the following structure:

Directory Structure Format (Default):

cache_dir/
  cache_key/
    vaiml_par_0/
      # Compiled model files
    final-vaiml-pass-summary.txt  # Compilation summary report
  [additional compilation artifacts]

Model Storage Formats#

The compiler supports two storage formats for the compiled model:

Directory Structure Format (Default)

The compiled model is stored in the cache_dir/cache_key as a directory hierarchy containing the following:

vaiml_par_0/ - Partition subdirectory containing NPU-specific binaries and runtime files
final-vaiml-pass-summary.txt - Compilation summary with performance metrics
Model metadata and intermediate compilation artifacts (useful for debugging)

What to copy to the board: Copy the entire cache_key directory to your target board, maintaining the directory structure.

When to use:

During development and debugging
When you need to inspect compilation artifacts
When you want detailed logs and reports

What to copy to the board: Copy only the single .rai file to your target board.

When to use:

For production deployment
When transferring models to target hardware
For large models where single-file management is easier
When you need faster loading times

Compilation Summary Report#

After compilation, you can review detailed compilation results in the summary report.

Viewing the Compilation Summary#

To view compilation details, display the content of the summary file:

cat cache_dir/cache_key/final-vaiml-pass-summary.txt

Understanding the Summary Report#

The compilation summary provides critical information about how your model was optimized:

Example Summary Report:

--------- Final Summary of VAIML Pass ----------
OS: Linux X64
VAIP commit: 744227ab2a0fddec1eccdfe04ca222afd339f53f
Model: /path/to/models/resnet18.a1_in1k.onnx
Model signature: 41d764d4ef1d716a260bc7b2b4e07ff1
Device: ve2
Model data type: float32
Device data type: bfloat16
Number of operators in the model: 49
GOPs of the model: 3.64388
Number of operators supported by VAIML: 49 (100.000%)
GOPs supported by VAIML: 3.644 (100.000%)
Number of subgraphs supported by VAIML: 1
Number of operators offloaded by VAIML: 49 (100.000%)
GOPs offloaded by VAIML: 3.644 (100.000%)
Number of subgraphs offloaded by VAIML: 1
Number of subgraphs with compilation errors (fall back to CPU): 0
Number of subgraphs below 20% GOPs threshold (fall back to CPU): 0
Number of subgraphs above max number of subgraphs allowed(7): 0 (fall back to CPU)
Stats for offloaded subgraphs
Subgraph vaiml_par_0 stats:
    Type: npu
    Operators: 49 (100.000%)
    GOPs : 3.644 (100.000%)  OPs: 3,643,881,552

Key Metrics Explained#

Metric	Description
Model signature	Unique hash identifying your model
Device	Target NPU device (for example, ve2-xc2ve3858)
Model data type	Original model precision (for example, float32)
Device data type	NPU execution precision (for example, bfloat16)
Number of operators	Total operators in your model
GOPs	Giga Operations - computational complexity measure
Operators supported by VAIML	Percentage of operators that can run on NPU
GOPs supported by VAIML	Percentage of computation that can run on NPU
Operators offloaded by VAIML	Operators actually running on NPU (after optimization)
GOPs offloaded by VAIML	Computation actually running on NPU (after optimization)
Number of subgraphs	Model partitions created for NPU execution
Subgraph type	Execution target: `npu` or `cpu`

Interpreting Compilation Results#

Optimal Compilation (100% NPU offload):

Number of operators offloaded by VAIML: 49 (100.000%)
GOPs offloaded by VAIML: 3.644 (100.000%)
Number of subgraphs offloaded by VAIML: 1

This indicates all operations run on the NPU for maximum performance.

Partial NPU Offload:

Number of operators offloaded by VAIML: 35 (71.429%)
GOPs offloaded by VAIML: 2.500 (68.627%)
Number of subgraphs offloaded by VAIML: 2
Number of subgraphs with compilation errors (fall back to CPU): 1

This indicates some operations fall back to CPU execution. Check the reasons:

Compilation errors: Subgraph failed to compile for NPU
Below 20% GOPs threshold: Subgraph too small, CPU execution more efficient
Above max subgraphs (7): Too many partitions, excess falls back to CPU

Details on model deployment on the board can be found in ONNX Runtime Python Inference

Compiled Model Output

Contents