Converting Float32 Models to FP16#

AMD Quark supports converting models from float32 to float16. The Vitis AI compiler can then compile FP16 models for execution on the NPU. FP16 reduces model size and memory bandwidth requirements. Most models experience minimal accuracy degradation with FP16; significant degradation usually indicates an export or conversion problem rather than an inherent FP16 limitation.

When to use FP16:

Accuracy-critical applications where BF16 shows excessive accuracy loss on NPU
Precision-sensitive architectures (depth estimation, regression tasks)
Models requiring high mantissa precision
When you need better accuracy than BF16 but cannot tolerate INT8 quantization complexity

Convert a float32 model to FP16 (FP16 I/O)#

Run the following command inside the Docker container:

python3 -m quark.onnx.​tools.​convert_fp32​_to​_fp16 --input $FLOAT​_32​_ONNX​_MODEL​_PATH --output $FLOAT​_16​_ONNX​_MODEL​_PATH

Note

The resulting model uses float16 for both inputs and outputs. During inference, provide FP16 tensors and read FP16 outputs.

For compatibility with existing inference pipelines that expect FP32 inputs and outputs, you can convert the model internals to FP16 while keeping the I/O interface in FP32.

Convert to FP16 model but keep Float32 (FP32 I/O)#

To convert a float32 model to float16 while keeping the input and output interface in float32, run the following command inside the Docker container:

python3 -m quark.onnx.​tools.​convert_fp32​_to​_fp16 --input $FLOAT​_32​_ONNX​_MODEL​_PATH --output $FLOAT​_16​_ONNX​_MODEL​_PATH --keep_io_types

If the input model is larger than 2GB, use the --save_as_external_data flag:

python3 -m quark.onnx.​tools.​convert_fp32​_to​_fp16 --input $FLOAT​_32​_ONNX​_MODEL​_PATH --output $FLOAT​_16​_ONNX​_MODEL​_PATH --save_as_external_data

For complete FP16 conversion documentation, see the AMD Quark ONNX Tools Guide: https://quark.docs.amd.com/latest/onnx/tools.html

Validating FP16 Conversion#

After converting your model to FP16, validate it on CPU or GPU using ONNX Runtime before attempting NPU deployment. Compare the FP16 model accuracy against your FP32 baseline. FP16 should show minimal accuracy degradation for most models. Significant accuracy loss usually indicates an export or conversion problem rather than an inherent FP16 limitation.

For complete validation methodology, see Model Accuracy Validation Methodology.

After converting your model to FP16, compile it for NPU deployment using the Vitis AI compiler.

FP16 Model Compilation#

Full FP16 ONNX model compilation uses the same Vitis AI compiler configuration as float32 models. The following example shows the compiler JSON configuration:

 {
    "passes": [
        {
            "name": "init",
            "plugin": "vaip-pass_init"
        },
        {
            "name": "vaiml_partition",
            "plugin": "vaip-pass_vaiml_partition",
            "vaiml_config": {
                "keep_outputs": true,
                "device": "ve2-xc2ve3858",
                "optimize_level": 2,
                "logging_level": "info"
            }
        }
    ],
    "target": "VAIML",
    "targets": [
        {
            "name": "VAIML",
            "pass": [
                "init",
                "vaiml_partition"
            ]
        }
    ]
}

For complete compilation setup and additional configuration options, see Model Compilation.

Converting Float32 Models to FP16

Contents

Converting Float32 Models to FP16#

Convert a float32 model to FP16 (FP16 I/O)#

Convert to FP16 model but keep Float32 (FP32 I/O)#

Validating FP16 Conversion#

FP16 Model Compilation#