CPU Partition Compilation#
Overview#
CPU partition support is an advanced AMD Vitis™ AI compiler feature that enables heterogeneous execution of ONNX models across NPU and CPU hardware using Vitis AI compiler-supported CPU operators. When enabled, the compiler automatically partitions computational graphs into NPU-executable and CPU-executable subgraphs, generates connectivity metadata for cross-partition data flow, and produces unified compilation artifacts.
This feature enables models to use a specific set of CPU operators provided by the Vitis AI compiler, allowing deployment with VART ML Runtime alone without requiring ONNX Runtime. The feature is implemented through a series of compiler passes that work together to identify NPU-incompatible operations, route them to CPU execution using compiler-supported operators, and maintain tensor handoffs between hardware execution contexts.
Note
In this release, VART ML Runtime supports CPU partition execution. ONNX Runtime does not support CPU partition execution.
Choosing a Compilation Configuration#
There are two compilation configurations available depending on the target runtime and deployment requirements.
CPU Partition Compilation#
This configuration enables the CPU partition passes
(vaiml_cpu_partition, vaiml_connectivity, vaiml_create_cache).
The model might contain operators that cannot run on the NPU.
The target runtime is VART ML Runtime.
Heterogeneous NPU and CPU execution is supported.
This configuration produces artifacts that support heterogeneous NPU and CPU execution using compiler-supported CPU operators.
Standard Compilation#
This configuration compiles the model without the CPU partition passes. Use this configuration when targeting either of the following runtimes:
ONNX Runtime: The model might contain operators that cannot run on the NPU. ONNX Runtime provides its own broad CPU operator support and handles any NPU-incompatible operators independently of the Vitis AI compiler.
VART ML Runtime: All operators in the model must be supported on the NPU. VART ML Runtime does not provide fallback CPU operator support in this configuration, so any NPU-incompatible operators cause an error.
Configuring the Compiler Passes#
CPU partition support is enabled through the vitisai_config.json compiler
configuration file. The feature requires additional compiler passes: init,
vaiml_partition, vaiml_cpu_partition, vaiml_connectivity, and
vaiml_create_cache.
The compiler passes must be configured in the passes array with specific
ordering requirements. For detailed information on configuration file
parameters, see AMD Vitis™ AI EP Configuration File. For compilation
commands and procedures, see Model Compilation.
{
"passes": [
{
"name": "init",
"plugin": "vaip-pass_init"
},
{
"name": "vaiml_partition",
"plugin": "vaip-pass_vaiml_partition",
"vaiml_config": {
"device": "ve2-xc2ve3858",
"optimize_level": 3,
"keep_outputs": true
}
},
{
"name": "vaiml_cpu_partition",
"plugin": "vaip-pass_vaiml_cpu"
},
{
"name": "vaiml_connectivity",
"plugin": "vaip-pass_vaiml_connectivity"
},
{
"name": "vaiml_create_cache",
"plugin": "vaip-pass_vaiml_create_cache"
}
],
"target": "VAIML",
"targets": [
{
"name": "VAIML",
"pass": [
"init",
"vaiml_partition",
"vaiml_cpu_partition",
"vaiml_connectivity",
"vaiml_create_cache"
]
}
]
}
Compiler Pass Details#
The order of passes in the configuration is critical. Each pass builds on the previous one:
init: Initializes the compiler environment and validates the model.vaiml_partition: Identifies NPU-compatible subgraphs.vaiml_cpu_partition: Assigns unsupported operators to CPU.vaiml_connectivity: Stitches NPU and CPU tensor connections.vaiml_create_cache: Compiles the NPU subgraph and caches the binary.
The vaiml_cpu_partition pass must come after vaiml_partition to ensure
it only captures CPU subgraphs of unsupported operators.
The init pass prepares the environment and model before any partitioning
or compilation work begins. It sets up the Vitis AI compiler/runtime context,
loads and validates the input ONNX model, detects available NPU hardware,
loads the NPU-supported operator registry, checks for existing compilation
cache, and parses compiler configuration options.
Subsequent passes require knowledge of target hardware capabilities, supported
operators, and a validated model structure. Without init, the compiler
cannot determine what hardware it is targeting or which operators are
NPU-compatible. This pass must be the first in the execution order.
The vaiml_partition pass identifies which subgraphs can execute on the
NPU. It partitions the model for NPU execution by identifying NPU-compatible
subgraphs. The vaiml_config section within this pass contains
model-specific options such as device, optimize_level, and keep_outputs.
The vaiml_cpu_partition pass handles operators that cannot run on the NPU.
After the main vaiml_partition identifies which subgraphs go to the NPU,
this pass assigns the remaining unsupported operators to the CPU.
Not all ONNX operators are NPU-compatible. Examples include ArgMax,
NonMaxSuppression, and other dynamic operators. This
pass ensures those operators are correctly routed to CPU execution so the
full model can still run end-to-end.
Runtimes that support CPU partition execution can execute certain operators on the CPU. In this release, VART ML Runtime uses CPU operator implementations for a specific set of operators provided by Vitis AI compiler. The complete list of compiler-supported CPU operators that VART ML Runtime can use is documented in CPU Operators Supported by VART ML Runtime.
The vaiml_connectivity pass resolves data flow connections between NPU
and CPU subgraphs. After partitioning, there are multiple subgraphs (some on
NPU, some on CPU). This pass stitches them together by managing tensor
handoffs, input/output boundaries, and data routing between execution
providers. It also contains the top-level inputs and outputs for the entire
model.
When a model is split across NPU and CPU, the runtime needs to know exactly how tensors flow from one partition to another. Without this connectivity information, the split graph would be disconnected and non-functional.
The vaiml_create_cache pass compiles and caches the NPU subgraphs. It
takes the finalized NPU partition and compiles it into a binary cache artifact
that can be loaded directly by the NPU at inference time. The output is a
single file archive (rai file) that contains all subgraphs, both NPU and
CPU - along with necessary metadata.
CPU Operator Support#
There is an important distinction between CPU operators supported by Vitis AI compiler and general CPU operator execution. The two approaches differ in scope, runtime dependency, and deployment suitability.
Compiler-Supported CPU Operators (CPU Partition Compilation)#
The compiler’s CPU partition feature routes operators to CPU implementations provided by Vitis AI compiler. In this release, VART ML Runtime uses these CPU operator implementations for a specific set of operators. This set includes operators like NonMaxSuppression, ArgMax, and Add, among others. The complete list of compiler-supported CPU operators that VART ML Runtime can use is documented in CPU Operators Supported by VART ML Runtime.
When using VART ML Runtime with CPU partition support:
NPU-compatible operators execute on the NPU.
NPU-incompatible operators that appear in the CPU operator list as being provided by Vitis AI compiler.
NPU-incompatible operators that do not appear in the CPU operator list as being provided by Vitis AI compiler cannot be executed.
This approach provides lightweight deployment without requiring ONNX Runtime as a dependency, making it particularly suitable for embedded systems with constrained resources.
General CPU Execution (Standard Compilation)#
Standard compilation does not include the CPU partition passes and supports two runtimes with different operator coverage requirements.
ONNX Runtime: Provides broad CPU execution support for virtually all standard ONNX operators, covering hundreds of operators across the ONNX specification. Any NPU-incompatible operators are automatically executed on the CPU by ONNX Runtime, independently of the Vitis AI compiler. This makes standard compilation with ONNX Runtime suitable for models that contain NPU-incompatible operators outside the compiler supported CPU operator list.
VART ML Runtime: Requires all operators in the model to be supported on the NPU. No CPU fallback is available in this configuration. Any NPU-incompatible operators cause an error at runtime.
For ONNX Runtime deployment, use standard compilation without the CPU partition passes. Models compiled with the CPU partition passes produce artifacts that ONNX Runtime does not support in this release.
Choosing the Right Configuration#
The choice of compilation configuration and runtime depends on your model’s operator requirements. Use the following table to determine the appropriate configuration for your deployment scenario.
Scenario |
Compilation Configuration |
Runtime |
|---|---|---|
Model contains NPU-incompatible operators that are listed in the CPU operator list as being provided by Vitis AI compiler |
CPU Partition Compilation |
VART ML Runtime |
Model contains NPU-incompatible operators that are not listed in the CPU operator list as being provided by Vitis AI compiler |
Standard Compilation |
ONNX Runtime |
Model fully offloads to NPU |
Standard Compilation or CPU Partition Compilation |
VART ML Runtime or ONNX Runtime |
To determine if your model is compatible with CPU partition compilation, verify that any NPU-incompatible operators in your model appear in the CPU operator list as being provided by Vitis AI compiler. You can identify NPU-incompatible operators by comparing your model’s operators against the NPU operator support list documented in Supported Operators.
Compilation Output#
When CPU partition support is enabled, the compiler generates the following
output files in the directory specified by the cache_dir provider option.
rai file: A single file archive containing all subgraphs (both NPU and CPU) and the metadata required for runtime execution. This file is loaded directly by VART ML Runtime at inference time.
connectivity_metadata.json: Describes the execution order, partition interconnection information, and top-level model inputs and outputs. This file is used by the runtime to coordinate data flow between NPU and CPU subgraphs.
Limitations and Constraints#
The CPU partition feature requires three additional passes beyond standard
compilation: vaiml_cpu_partition, vaiml_connectivity, and
vaiml_create_cache. These passes work together as a unit. Omitting any
one of them prevents proper CPU partition functionality.
All pass names must appear in both the passes array (where they are
defined with their plugins and parameters) and the targets section (where
they are referenced by name in the execution order). The compiler validates
this consistency and reports configuration errors if passes are defined
but not referenced, or vice versa.
CPU partition compilation is limited to the set of CPU operators provided by the Vitis AI compiler. Models that contain NPU-incompatible operators outside this set cannot be compiled with CPU partition support and must use standard compilation with ONNX Runtime instead. Refer to CPU Operators Supported by VART ML Runtime for the complete list of supported CPU operators.
Models compiled with CPU partition passes produce artifacts that are specific to runtimes that support the partitioning scheme. In this release:
VART ML Runtime supports CPU partition compilation artifacts.
ONNX Runtime does not support CPU partition compilation artifacts.
Attempting to load a CPU partition compiled model with an unsupported runtime results in an error.
Additional Resources#
Vitis AI Configuration Reference: AMD Vitis™ AI EP Configuration File
Model Compilation Guide: Model Compilation
CPU Operators List: CPU Operators Supported by VART ML Runtime
NPU Operator Support List: Supported Operators