Supported Operators#
The following table lists the ONNX operators supported in AMD Vitis™ AI 6.2. The operators are categorized by their support for different quantization types: FP32 (automatically converted to BF16 within the compiler), INT8 and FP16. A “Y” indicates that Vitis AI provides broad coverage for that operator and for that specific quantization type for CNN models. Some specific configurations of that operator might not be fully supported, however.
Operator |
FP32 |
FP16 |
INT8 |
|---|---|---|---|
Abs |
Y |
Y |
Y |
Add |
Y |
Y |
Y |
And |
Y |
Y |
Y |
Attention |
Y |
N/A |
N/A |
AveragePool |
Y |
Y |
Y |
BatchNormalization |
Y |
Y |
Y |
BitwiseAnd |
Y |
Y |
Y |
BitwiseNot |
Y |
Y |
Y |
BitwiseOr |
Y |
Y |
Y |
BitwiseXor |
Y |
Y |
Y |
Cast |
Y |
Y |
Y |
Ceil |
Y |
Y |
Y |
Clip |
Y |
Y |
Y |
Concat |
Y |
Y |
Y |
ConstantOfShape |
Y |
Y |
Y |
ConvTranspose |
Y |
N/A |
Y |
Cos |
Y |
Y |
Y |
CumSum |
Y |
Y |
N/A |
DepthToSpace |
Y |
Y |
Y |
Div |
Y |
Y |
Y |
Einsum |
Y |
Y |
Y |
Elu |
Y |
Y |
Y |
Equal |
Y |
Y |
Y |
Erf |
Y |
Y |
Y |
Exp |
Y |
Y |
Y |
Expand |
Y |
Y |
Y |
Flatten |
Y |
Y |
Y |
Floor |
Y |
Y |
Y |
Gather |
Y |
Y |
Y |
Gelu |
Y |
Y |
Y |
GlobalAveragePool |
Y |
Y |
Y |
Greater |
Y |
Y |
Y |
GreaterOrEqual |
Y |
Y |
Y |
GridSample |
Y |
Y |
N/A |
HardSigmoid |
Y |
Y |
Y |
HardSwish |
Y |
Y |
Y |
Identity |
Y |
Y |
Y |
InstanceNormalization |
Y |
Y |
Y |
LayerNormalization |
Y |
Y |
Y |
LeakyRelu |
Y |
Y |
Y |
Less |
Y |
Y |
Y |
LessOrEqual |
Y |
Y |
Y |
Log |
Y |
Y |
Y |
LogSoftmax |
Y |
Y |
Y |
Max |
Y |
Y |
Y |
MaxPool |
Y |
Y |
Y |
Min |
Y |
Y |
Y |
Mish |
Y |
Y |
Y |
Mul |
Y |
Y |
Y |
Neg |
Y |
Y |
Y |
Not |
Y |
Y |
Y |
Or |
Y |
Y |
Y |
PRelu |
Y |
Y |
N/A |
Pad |
Y |
Y |
Y |
Pow |
Y |
Y |
Y |
QuickGelu |
Y |
Y |
Y |
RMSLayerNormalization |
Y |
Y |
Y |
Range |
Y |
Y |
Y |
Reciprocal |
Y |
Y |
Y |
ReduceL1 |
Y |
Y |
Y |
ReduceL2 |
Y |
Y |
N/A |
ReduceLogSum |
Y |
Y |
Y |
ReduceLogSumExp |
Y |
Y |
Y |
ReduceMax |
Y |
Y |
Y |
ReduceMean |
Y |
Y |
Y |
ReduceMin |
Y |
Y |
Y |
ReduceSum |
Y |
Y |
Y |
ReduceSumSquare |
Y |
Y |
Y |
Relu |
Y |
Y |
Y |
Reshape |
Y |
Y |
Y |
Resize |
Y |
Y |
Y |
Round |
Y |
Y |
Y |
Scaler |
Y |
Y |
Y |
ScatterND |
Y |
Y |
Y |
Selu |
Y |
Y |
N/A |
Shape |
Y |
Y |
Y |
SiLU |
Y |
Y |
Y |
Sigmoid |
Y |
Y |
Y |
Sign |
Y |
Y |
Y |
Sin |
Y |
Y |
Y |
Slice |
Y |
Y |
Y |
Softmax |
Y |
Y |
Y |
Softplus |
Y |
Y |
N/A |
SpaceToDepth |
Y |
Y |
Y |
Split |
Y |
Y |
Y |
Sqrt |
Y |
Y |
Y |
Squeeze |
Y |
Y |
Y |
Sub |
Y |
Y |
Y |
Sum |
Y |
Y |
Y |
Tanh |
Y |
Y |
Y |
ThresholdedRelu |
Y |
Y |
N/A |
Tile |
Y |
Y |
Y |
TopK |
Y |
Y |
N/A |
Transpose |
Y |
Y |
Y |
Unsqueeze |
Y |
Y |
Y |
Upsample |
Y |
Y |
Y |
Xor |
Y |
Y |
Y |
Note
N/A: Not applicable (the operator is not supported in ONNX)
Operator Shape Support Matrix#
The information below outlines a subset of the operators listed above with some precisions on the supported shapes and configurations for each operator.
AveragePool#
Implements average pooling over a 2D input.
Important
The following values describe the hardware kernel’s internal subvolume dimensions operating in L1 memory. They represent base shapes, not upper or lower limits on the kernel shapes accepted by the Vitis AI compiler. Building from these base values, the compiler might add support for a broader range of shapes through automatic padding, tiling, and other transforms applied at compile time.
- The Vitis AI compiler automatically handles:
Tiling of large tensors across L3 → L2 → L1 memory
Padding of dimensions to meet hardware alignment
Decomposition and recomposition of operators
Layer fusion
Supported Features:
Data Types: bf16, fp16, int8
ksize_h, ksize_w: Pooling kernel height and width (1 to 20, step size 1)
stride_h, stride_w: Kernel strides for height and width (1 to 20, step size 1)
Concat#
Implements the concatenation operator.
Supported Features:
Data Types: bf16, fp16, int8/uint8
concat_axis: Specifies the axis along which concatenation is performed (allowed: 0 to 3, step size 1)
Conv2D (FP32, FP16)#
Provides a 2D convolution operator in FP32 (automatically converted to BF16 within the compiler) and FP16 formats.
Important
The following values describe the hardware kernel’s internal subvolume dimensions operating in L1 memory. They represent base shapes, not upper or lower limits on the kernel shapes accepted by the Vitis AI compiler. Building from these base values, the compiler might add support for a broader range of shapes through automatic padding, tiling, and other transforms applied at compile time.
- The Vitis AI compiler automatically handles:
Tiling of large tensors across L3 → L2 → L1 memory
Padding of dimensions to meet hardware alignment
Decomposition and recomposition of operators
Layer fusion
Supported Features:
Data Types: Input, weights, and output must be in fp32 (bf16) or fp16 format.
Kernel Size:
Width: 1, 2, 3, 4, 5, 7, 15, 16, 128
Height: 1, 2, 3, 4, 5, 7, 9, 15
If kernel height is 3, kernel width must be either 3 or 15.
If kernel width is 15, kernel height must be 3.
If kernel height is 4, kernel width must be 16.
1xN or Nx1 kernels can be automatically converted to NxN if N is a supported width.
Dilation: Dilation is supported.
Groups: Groups are supported.
Stride:
Height: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
Width: 1, 2, 4
If stride width is 4, stride height must also be 4 (and vice versa).
Conv2D (INT8)#
Provides a 2D convolution operator in INT8 format, supporting various activation functions.
Important
The following values describe the hardware kernel’s internal subvolume dimensions operating in L1 memory. They represent base shapes, not upper or lower limits on the kernel shapes accepted by the Vitis AI compiler. Building from these base values, the compiler might add support for a broader range of shapes through automatic padding, tiling, and other transforms applied at compile time.
- The Vitis AI compiler automatically handles:
Tiling of large tensors across L3 → L2 → L1 memory
Padding of dimensions to meet hardware alignment
Decomposition and recomposition of operators
Layer fusion
Supported Features:
Data Types: Input, weights, and output must be in int8 format.
Kernel Size:
Width: 1 to 16 (step size 1)
Height: 1 to 16 (step size 1)
Dilation: Dilation is supported.
Groups: Groups are supported.
Stride:
Height: 1 to 16 (step size 1)
Width: 1 or 2
DepthToSpace#
Rearranges data from the depth (channel) dimension into spatial blocks.
Input Tensor: (N, C, H, W) -> Output Tensor: (N, C//(B*B), H*B, W*B)
Important
The following values describe the hardware kernel’s internal subvolume dimensions operating in L1 memory. They represent base shapes, not upper or lower limits on the kernel shapes accepted by the Vitis AI compiler. Building from these base values, the compiler might add support for a broader range of shapes through automatic padding, tiling, and other transforms applied at compile time.
- The Vitis AI compiler automatically handles:
Tiling of large tensors across L3 → L2 → L1 memory
Padding of dimensions to meet hardware alignment
Decomposition and recomposition of operators
Layer fusion
Supported Features:
Data Types: fp32 (bf16), fp16, int8, uint8
Mode: DCR, CRD
Block Size: 2, 3, 4, 8
Gemm (FP32 (BF16), FP16)#
General Matrix Multiplication operator supporting matrix product, bias addition, and optional transposition of weights.
Important
The following values describe the hardware kernel’s internal subvolume dimensions operating in L1 memory. They represent base shapes, not upper or lower limits on the kernel shapes accepted by the Vitis AI compiler. Building from these base values, the compiler might add support for a broader range of shapes through automatic padding, tiling, and other transforms applied at compile time.
- The Vitis AI compiler automatically handles:
Tiling of large tensors across L3 → L2 → L1 memory
Padding of dimensions to meet hardware alignment
Decomposition and recomposition of operators
Layer fusion
Supported Features:
Data Types: fp32 (bf16), fp16
Computation:
C(M, N) = A(M, K) x B(K, N) + bias
Ranges:
M: 8 to 96 (step 8)
K: 64 to 128 (step 16); extended K: 1 to 65535 (step 1, by iteration)
N: 16 to 96 (step 16)
Bias Handling:
When used, bias is padded to a multiple of 32 and is concatenated with the weights: Data layout: {bias_padded, weight}
Gemm (INT8)#
General Matrix Multiplication operator supporting matrix product, bias addition, and optional transposition of weights.
Important
The following values describe the hardware kernel’s internal subvolume dimensions operating in L1 memory. They represent base shapes, not upper or lower limits on the kernel shapes accepted by the Vitis AI compiler. Building from these base values, the compiler might add support for a broader range of shapes through automatic padding, tiling, and other transforms applied at compile time.
- The Vitis AI compiler automatically handles:
Tiling of large tensors across L3 → L2 → L1 memory
Padding of dimensions to meet hardware alignment
Decomposition and recomposition of operators
Layer fusion
Supported Features:
Data Types: int8
Computation:
C(M, N) = A(M, K) x B(K, N) + bias
Ranges:
M: 16 to 240 (step 16)
K: 40 to 248 (step 8); extended K: 1 to 65535 (step 1, by iteration)
N: 16 to 240 (step 16)
Bias Handling:
When used, bias is padded to a multiple of 32 and is concatenated with the weights: Data layout: {bias_padded, weight}
GlobalAveragePool#
Implements 2D global average pooling.
Important
The following values describe the hardware kernel’s internal subvolume dimensions operating in L1 memory. They represent base shapes, not upper or lower limits on the kernel shapes accepted by the Vitis AI compiler. Building from these base values, the compiler might add support for a broader range of shapes through automatic padding, tiling, and other transforms applied at compile time.
- The Vitis AI compiler automatically handles:
Tiling of large tensors across L3 → L2 → L1 memory
Padding of dimensions to meet hardware alignment
Decomposition and recomposition of operators
Layer fusion
Supported Features:
Data Types: fp32 (bf16), fp16, int8
stride: 1 to 20, step size 1
kernel size: 1 to 20, step size 1
MaxPool#
Implements 2D max pooling.
Important
The following values describe the hardware kernel’s internal subvolume dimensions operating in L1 memory. They represent base shapes, not upper or lower limits on the kernel shapes accepted by the Vitis AI compiler. Building from these base values, the compiler might add support for a broader range of shapes through automatic padding, tiling, and other transforms applied at compile time.
- The Vitis AI compiler automatically handles:
Tiling of large tensors across L3 → L2 → L1 memory
Padding of dimensions to meet hardware alignment
Decomposition and recomposition of operators
Layer fusion
Supported Features:
Data Types: fp32 (bf16), fp16, int8
stride: 1 to 20, step size 1
kernel size: 1 to 20, step size 1
Pad#
Implements 2D padding. The operator fills the edges of a tensor with the specified padding value. Supported modes: TOP, BOT, LEFT, RIGHT, BOTRIGHT, TOPLEFT.
Important
The following values describe the hardware kernel’s internal subvolume dimensions operating in L1 memory. They represent base shapes, not upper or lower limits on the kernel shapes accepted by the Vitis AI compiler. Building from these base values, the compiler might add support for a broader range of shapes through automatic padding, tiling, and other transforms applied at compile time.
- The Vitis AI compiler automatically handles:
Tiling of large tensors across L3 → L2 → L1 memory
Padding of dimensions to meet hardware alignment
Decomposition and recomposition of operators
Layer fusion
Supported Features#
Data Types: bf16, fp16, int8, uint8
pad_left, pad_right, pad_top, pad_bot: Positions padded on each edge (0 to 255, step size 1)
Supported Padding Modes#
Mode |
Supported |
Description |
|---|---|---|
|
Yes |
Pads with a constant value (for example, zero padding or a specified fill value) |
|
Yes |
Pads by reflecting the input tensor values at the border |
|
No |
Not supported |
|
No |
Not supported |
Resize#
Implements tensor resizing with configurable interpolation.
Important
The following values describe the hardware kernel’s internal subvolume dimensions operating in L1 memory. They represent base shapes, not upper or lower limits on the kernel shapes accepted by the Vitis AI compiler. Building from these base values, the compiler might add support for a broader range of shapes through automatic padding, tiling, and other transforms applied at compile time.
- The Vitis AI compiler automatically handles:
Tiling of large tensors across L3 → L2 → L1 memory
Padding of dimensions to meet hardware alignment
Decomposition and recomposition of operators
Layer fusion
Supported Features:
Data Types: fp32 (bf16), fp16, int8
Interpolation modes:
mode: 0 (nearest), 1 (linear)
nearest_mode: 0 (floor), 1 (round_prefer_ceil), 2 (round_prefer_floor; default)
Scale Factor: Maximum scale factor less than 32
Slice#
Implements slicing of a tensor.
Important
The following values describe the hardware kernel’s internal subvolume dimensions operating in L1 memory. They represent base shapes, not upper or lower limits on the kernel shapes accepted by the Vitis AI compiler. Building from these base values, the compiler might add support for a broader range of shapes through automatic padding, tiling, and other transforms applied at compile time.
- The Vitis AI compiler automatically handles:
Tiling of large tensors across L3 → L2 → L1 memory
Padding of dimensions to meet hardware alignment
Decomposition and recomposition of operators
Layer fusion
Supported Features:
Data Types: fp32 (bf16), fp16, int8
Slice indices:
height_start, width_start, depth_start, height_end, width_end, depth_end (0 to 2048, step size 1)
height_step, width_step, depth_step (1 to 2048, step size 1)
Split#
Implements tensor splitting along a specified axis.
Supported Features:
Data Types: fp32 (bf16), fp16, int8, uint8
split_axis: Axis to split on (allowed values: 0 to 2, step size 1)
CPU Because#
Supported operators might still fall back to CPU for specific instances. The compiler displays a diagnostic message explaining why a specific layer was rejected and mapped to CPU. Two common reasons are:
Unsupported configuration: The operator is supported in general, but the specific configuration of that operator in the model is not supported. For example, a Conv2D operator with a kernel size of 17x17 or a large stride would fall back to CPU because it does not meet the supported kernel size configurations for Conv2D. The error message would look like this:
YAML constraints failed [MLLIB]: [0: 'Conv2d': 'while specializing value:
'config.stride_w' configuration value is set to
'Allowed(data_type=uint8, values=[1, 2], default_value={})' by the library,
but parameterization is trying to override with '4', which is incompatible'
Memory limitations: The operator is supported, but the required memory is not available on the NPU for the specific instance. The compiler cannot tile the tensor so that both Input activations, Output activations and Weights fit simultaneously in the L1 data memory. In such cases, the operator is executed on the CPU emitting a message like this:
Unwrapping layer with 'Conv2DBf16' kernel(s) that cannot be implemented
due to 'Insufficient L1 buffer space for IFM, OFM and WTS [WTS_FM_NOT_FIT_L1MEM]'
These operators are listed in the “CPU Because” table in the AI Analyzer Partitioning Summary page, along with the reason for CPU fallback (Partitioning).
Note
Vitis AI compiler doesn’t create a log file to store all displayed messages. If you want to save the messages, you can redirect the output of the compiler, both stdout and stderr, to a file using the following command:
python compile.py 2>&1 | tee vaicompiler.log
This command still displays the messages in the terminal while also saving them to vaicompiler.log for later reference.