Compiled Model Output#

After successful compilation, the AMD Vitis™ AI Execution Provider generates compiled model artifacts in the cache_dir/cache_key directory. These files contain all the necessary binaries and metadata required for model execution on the NPU at runtime.

Understanding the Compiled Model#

When you compile a model, the entire directory structure created in cache_dir/cache_key constitutes your compiled model. All files within this directory are required for runtime inference.

What gets created:

  • Binary files optimized for NPU execution

  • Model metadata and configuration

  • Runtime execution graphs

  • Compilation logs and summary reports

Directory Structure#

After compilation, your cache_dir contains the following structure:

Directory Structure Format (Default):

cache_dir/
  cache_key/
    vaiml_par_0/
      # Compiled model files
    final-vaiml-pass-summary.txt  # Compilation summary report
  [additional compilation artifacts]

Model Storage Formats#

The compiler supports two storage formats for the compiled model:

Directory Structure Format (Default)

The compiled model is stored in the cache_dir/cache_key as a directory hierarchy containing the following:

  • vaiml_par_0/ - Partition subdirectory containing NPU-specific binaries and runtime files

  • final-vaiml-pass-summary.txt - Compilation summary with performance metrics

  • Model metadata and intermediate compilation artifacts (useful for debugging)

What to copy to the board: Copy the entire cache_key directory to your target board, maintaining the directory structure.

When to use:

  • During development and debugging

  • When you need to inspect compilation artifacts

  • When you want detailed logs and reports

What to copy to the board: Copy only the single .rai file to your target board.

When to use:

  • For production deployment

  • When transferring models to target hardware

  • For large models where single-file management is easier

  • When you need faster loading times

Compilation Summary Report#

After compilation, you can review detailed compilation results in the summary report.

Viewing the Compilation Summary#

To view compilation details, display the content of the summary file:

cat cache_dir/cache_key/final-vaiml-pass-summary.txt

Understanding the Summary Report#

The compilation summary provides critical information about how your model was optimized:

Example Summary Report:

--------- Final Summary of VAIML Pass ----------
OS: Linux X64
VAIP commit: 744227ab2a0fddec1eccdfe04ca222afd339f53f
Model: /path/to/models/resnet18.a1_in1k.onnx
Model signature: 41d764d4ef1d716a260bc7b2b4e07ff1
Device: ve2
Model data type: float32
Device data type: bfloat16
Number of operators in the model: 49
GOPs of the model: 3.64388
Number of operators supported by VAIML: 49 (100.000%)
GOPs supported by VAIML: 3.644 (100.000%)
Number of subgraphs supported by VAIML: 1
Number of operators offloaded by VAIML: 49 (100.000%)
GOPs offloaded by VAIML: 3.644 (100.000%)
Number of subgraphs offloaded by VAIML: 1
Number of subgraphs with compilation errors (fall back to CPU): 0
Number of subgraphs below 20% GOPs threshold (fall back to CPU): 0
Number of subgraphs above max number of subgraphs allowed(7): 0 (fall back to CPU)
Stats for offloaded subgraphs
Subgraph vaiml_par_0 stats:
    Type: npu
    Operators: 49 (100.000%)
    GOPs : 3.644 (100.000%)  OPs: 3,643,881,552

Key Metrics Explained#

Metric

Description

Model signature

Unique hash identifying your model

Device

Target NPU device (for example, ve2-xc2ve3858)

Model data type

Original model precision (for example, float32)

Device data type

NPU execution precision (for example, bfloat16)

Number of operators

Total operators in your model

GOPs

Giga Operations - computational complexity measure

Operators supported by VAIML

Percentage of operators that can run on NPU

GOPs supported by VAIML

Percentage of computation that can run on NPU

Operators offloaded by VAIML

Operators actually running on NPU (after optimization)

GOPs offloaded by VAIML

Computation actually running on NPU (after optimization)

Number of subgraphs

Model partitions created for NPU execution

Subgraph type

Execution target: npu or cpu

Interpreting Compilation Results#

Optimal Compilation (100% NPU offload):

Number of operators offloaded by VAIML: 49 (100.000%)
GOPs offloaded by VAIML: 3.644 (100.000%)
Number of subgraphs offloaded by VAIML: 1

This indicates all operations run on the NPU for maximum performance.

Partial NPU Offload:

Number of operators offloaded by VAIML: 35 (71.429%)
GOPs offloaded by VAIML: 2.500 (68.627%)
Number of subgraphs offloaded by VAIML: 2
Number of subgraphs with compilation errors (fall back to CPU): 1

This indicates some operations fall back to CPU execution. Check the reasons:

  • Compilation errors: Subgraph failed to compile for NPU

  • Below 20% GOPs threshold: Subgraph too small, CPU execution more efficient

  • Above max subgraphs (7): Too many partitions, excess falls back to CPU

Details on model deployment on the board can be found in ONNX Runtime Python Inference