CPU Partition Compilation#

Overview#

CPU partition support is an advanced AMD Vitis™ AI compiler feature that enables heterogeneous execution of ONNX models across NPU and CPU hardware using Vitis AI compiler-supported CPU operators. When enabled, the compiler automatically partitions computational graphs into NPU-executable and CPU-executable subgraphs, generates connectivity metadata for cross-partition data flow, and produces unified compilation artifacts.

This feature enables models to use a specific set of CPU operators provided by the Vitis AI compiler, allowing deployment with VART ML Runtime alone without requiring ONNX Runtime. The feature is implemented through a series of compiler passes that work together to identify NPU-incompatible operations, route them to CPU execution using compiler-supported operators, and maintain tensor handoffs between hardware execution contexts.

Note

In this release, VART ML Runtime supports CPU partition execution. ONNX Runtime does not support CPU partition execution.

Choosing a Compilation Configuration#

There are two compilation configurations available depending on the target runtime and deployment requirements.

CPU Partition Compilation#

This configuration enables the CPU partition passes (vaiml_cpu_partition, vaiml_connectivity, vaiml_create_cache).

  • The model might contain operators that cannot run on the NPU.

  • The target runtime is VART ML Runtime.

  • Heterogeneous NPU and CPU execution is supported.

This configuration produces artifacts that support heterogeneous NPU and CPU execution using compiler-supported CPU operators.

Standard Compilation#

This configuration compiles the model without the CPU partition passes. Use this configuration when targeting either of the following runtimes:

  • ONNX Runtime: The model might contain operators that cannot run on the NPU. ONNX Runtime provides its own broad CPU operator support and handles any NPU-incompatible operators independently of the Vitis AI compiler.

  • VART ML Runtime: All operators in the model must be supported on the NPU. VART ML Runtime does not provide fallback CPU operator support in this configuration, so any NPU-incompatible operators cause an error.

Configuring the Compiler Passes#

CPU partition support is enabled through the vitisai_config.json compiler configuration file. The feature requires additional compiler passes: init, vaiml_partition, vaiml_cpu_partition, vaiml_connectivity, and vaiml_create_cache.

The compiler passes must be configured in the passes array with specific ordering requirements. For detailed information on configuration file parameters, see AMD Vitis™ AI EP Configuration File. For compilation commands and procedures, see Model Compilation.

{
    "passes": [
        {
            "name": "init",
            "plugin": "vaip-pass_init"
        },
        {
            "name": "vaiml_partition",
            "plugin": "vaip-pass_vaiml_partition",
            "vaiml_config": {
                "device": "ve2-xc2ve3858",
                "optimize_level": 3,
                "keep_outputs": true
            }
        },
        {
            "name": "vaiml_cpu_partition",
            "plugin": "vaip-pass_vaiml_cpu"
        },
        {
            "name": "vaiml_connectivity",
            "plugin": "vaip-pass_vaiml_connectivity"
        },
        {
            "name": "vaiml_create_cache",
            "plugin": "vaip-pass_vaiml_create_cache"
        }
    ],
    "target": "VAIML",
    "targets": [
        {
            "name": "VAIML",
            "pass": [
                "init",
                "vaiml_partition",
                "vaiml_cpu_partition",
                "vaiml_connectivity",
                "vaiml_create_cache"
            ]
        }
    ]
}

Compiler Pass Details#

The order of passes in the configuration is critical. Each pass builds on the previous one:

  1. init: Initializes the compiler environment and validates the model.

  2. vaiml_partition: Identifies NPU-compatible subgraphs.

  3. vaiml_cpu_partition: Assigns unsupported operators to CPU.

  4. vaiml_connectivity: Stitches NPU and CPU tensor connections.

  5. vaiml_create_cache: Compiles the NPU subgraph and caches the binary.

The vaiml_cpu_partition pass must come after vaiml_partition to ensure it only captures CPU subgraphs of unsupported operators.

The init pass prepares the environment and model before any partitioning or compilation work begins. It sets up the Vitis AI compiler/runtime context, loads and validates the input ONNX model, detects available NPU hardware, loads the NPU-supported operator registry, checks for existing compilation cache, and parses compiler configuration options.

Subsequent passes require knowledge of target hardware capabilities, supported operators, and a validated model structure. Without init, the compiler cannot determine what hardware it is targeting or which operators are NPU-compatible. This pass must be the first in the execution order.

The vaiml_partition pass identifies which subgraphs can execute on the NPU. It partitions the model for NPU execution by identifying NPU-compatible subgraphs. The vaiml_config section within this pass contains model-specific options such as device, optimize_level, and keep_outputs.

The vaiml_cpu_partition pass handles operators that cannot run on the NPU. After the main vaiml_partition identifies which subgraphs go to the NPU, this pass assigns the remaining unsupported operators to the CPU.

Not all ONNX operators are NPU-compatible. Examples include ArgMax, NonMaxSuppression, and other dynamic operators. This pass ensures those operators are correctly routed to CPU execution so the full model can still run end-to-end.

Runtimes that support CPU partition execution can execute certain operators on the CPU. In this release, VART ML Runtime uses CPU operator implementations for a specific set of operators provided by Vitis AI compiler. The complete list of compiler-supported CPU operators that VART ML Runtime can use is documented in CPU Operators Supported by VART ML Runtime.

The vaiml_connectivity pass resolves data flow connections between NPU and CPU subgraphs. After partitioning, there are multiple subgraphs (some on NPU, some on CPU). This pass stitches them together by managing tensor handoffs, input/output boundaries, and data routing between execution providers. It also contains the top-level inputs and outputs for the entire model.

When a model is split across NPU and CPU, the runtime needs to know exactly how tensors flow from one partition to another. Without this connectivity information, the split graph would be disconnected and non-functional.

The vaiml_create_cache pass compiles and caches the NPU subgraphs. It takes the finalized NPU partition and compiles it into a binary cache artifact that can be loaded directly by the NPU at inference time. The output is a single file archive (rai file) that contains all subgraphs, both NPU and CPU - along with necessary metadata.

CPU Operator Support#

There is an important distinction between CPU operators supported by Vitis AI compiler and general CPU operator execution. The two approaches differ in scope, runtime dependency, and deployment suitability.

Compiler-Supported CPU Operators (CPU Partition Compilation)#

The compiler’s CPU partition feature routes operators to CPU implementations provided by Vitis AI compiler. In this release, VART ML Runtime uses these CPU operator implementations for a specific set of operators. This set includes operators like NonMaxSuppression, ArgMax, and Add, among others. The complete list of compiler-supported CPU operators that VART ML Runtime can use is documented in CPU Operators Supported by VART ML Runtime.

When using VART ML Runtime with CPU partition support:

  • NPU-compatible operators execute on the NPU.

  • NPU-incompatible operators that appear in the CPU operator list as being provided by Vitis AI compiler.

  • NPU-incompatible operators that do not appear in the CPU operator list as being provided by Vitis AI compiler cannot be executed.

This approach provides lightweight deployment without requiring ONNX Runtime as a dependency, making it particularly suitable for embedded systems with constrained resources.

General CPU Execution (Standard Compilation)#

Standard compilation does not include the CPU partition passes and supports two runtimes with different operator coverage requirements.

  • ONNX Runtime: Provides broad CPU execution support for virtually all standard ONNX operators, covering hundreds of operators across the ONNX specification. Any NPU-incompatible operators are automatically executed on the CPU by ONNX Runtime, independently of the Vitis AI compiler. This makes standard compilation with ONNX Runtime suitable for models that contain NPU-incompatible operators outside the compiler supported CPU operator list.

  • VART ML Runtime: Requires all operators in the model to be supported on the NPU. No CPU fallback is available in this configuration. Any NPU-incompatible operators cause an error at runtime.

For ONNX Runtime deployment, use standard compilation without the CPU partition passes. Models compiled with the CPU partition passes produce artifacts that ONNX Runtime does not support in this release.

Choosing the Right Configuration#

The choice of compilation configuration and runtime depends on your model’s operator requirements. Use the following table to determine the appropriate configuration for your deployment scenario.

Scenario

Compilation Configuration

Runtime

Model contains NPU-incompatible operators that are listed in the CPU operator list as being provided by Vitis AI compiler

CPU Partition Compilation

VART ML Runtime

Model contains NPU-incompatible operators that are not listed in the CPU operator list as being provided by Vitis AI compiler

Standard Compilation

ONNX Runtime

Model fully offloads to NPU

Standard Compilation or CPU Partition Compilation

VART ML Runtime or ONNX Runtime

To determine if your model is compatible with CPU partition compilation, verify that any NPU-incompatible operators in your model appear in the CPU operator list as being provided by Vitis AI compiler. You can identify NPU-incompatible operators by comparing your model’s operators against the NPU operator support list documented in Supported Operators.

Compilation Output#

When CPU partition support is enabled, the compiler generates the following output files in the directory specified by the cache_dir provider option.

  • rai file: A single file archive containing all subgraphs (both NPU and CPU) and the metadata required for runtime execution. This file is loaded directly by VART ML Runtime at inference time.

  • connectivity_metadata.json: Describes the execution order, partition interconnection information, and top-level model inputs and outputs. This file is used by the runtime to coordinate data flow between NPU and CPU subgraphs.

Limitations and Constraints#

The CPU partition feature requires three additional passes beyond standard compilation: vaiml_cpu_partition, vaiml_connectivity, and vaiml_create_cache. These passes work together as a unit. Omitting any one of them prevents proper CPU partition functionality.

All pass names must appear in both the passes array (where they are defined with their plugins and parameters) and the targets section (where they are referenced by name in the execution order). The compiler validates this consistency and reports configuration errors if passes are defined but not referenced, or vice versa.

CPU partition compilation is limited to the set of CPU operators provided by the Vitis AI compiler. Models that contain NPU-incompatible operators outside this set cannot be compiled with CPU partition support and must use standard compilation with ONNX Runtime instead. Refer to CPU Operators Supported by VART ML Runtime for the complete list of supported CPU operators.

Models compiled with CPU partition passes produce artifacts that are specific to runtimes that support the partitioning scheme. In this release:

  • VART ML Runtime supports CPU partition compilation artifacts.

  • ONNX Runtime does not support CPU partition compilation artifacts.

Attempting to load a CPU partition compiled model with an unsupported runtime results in an error.

Additional Resources#