ONNX Runtime C++ APIs#

ONNX Runtime is a high-performance engine for running deep learning models, supporting both inference and training execution. It provides user-friendly C and C++ APIs for integrating precompiled models into your application workflows.The C++ runtime application and workflow share similarities with the Python implementation, making it easier for users to transition between the two languages.

By compiling models with AMD Vitis™ AI, you can leverage ONNX Runtime’s C++ APIs to deploy those models on hardware and perform inference efficiently. The following example demonstrates a typical inference workflow using ONNX Runtime C++ APIs with the Vitis AI execution provider.

For further details about ONNX Runtime C++ APIs, refer to the official documentation: https://onnxruntime.ai/docs/api/c/c_cpp_api.html

Example: Inference Workflow with Vitis AI#

#include <onnxruntime_cxx_api.h>
#include <iostream>
#include <vector>
#include <chrono>
#include <random>
#include <fstream>

// Read raw data
bool load_raw_float(const std::string& filename, std::vector<float>& data, std::vector<int64_t>& shape) {
    std::ifstream f(filename, std::ios::binary);
    if (!f) return false;
    f.seekg(0, std::ios::end);
    size_t size = f.tellg();
    f.seekg(0, std::ios::beg);
    data.resize(size / sizeof(float));
    f.read(reinterpret_cast<char*>(data.data()), size);
    return true;
}

bool save_raw_float(const std::string& filename, const std::vector<float>& data) {
    std::ofstream f(filename, std::ios::binary);
    if (!f) return false;
    f.write(reinterpret_cast<const char*>(data.data()), data.size() * sizeof(float));
    return true;
}

int main(int argc, char* argv[]) {
    std::cout << "Usage: " << argv[0] << std::endl;

    const char* model_path = "/etc/vai/models/resnet50_int8/resnet50_int8.onnx";
    std::string input_file = "/etc/vai/models/resnet50_int8/data/ifm_input_fp32_1x3x224x224.bin";
    std::string output_file = "output_fp32_1x1000.bin";

    // Env + session
    Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "Default");
    Ort::SessionOptions session_options;

    // Set VitisAI-specific options
    std::unordered_map<std::string, std::string> options;
    options["config_file"] = "/etc/vai/models/resnet50_int8/vitisai_config.json";  // Config file
    options["cacheDir"] = "/etc/vai/models";  // Cache dir used to compile the model, should match compilation result.
    options["cacheKey"] = "resnet50_int8"; // Cache key used to compile the model, should match compilation result.
    options["target"] = "VAIML"; // Target Platform

    // ORT Session with VitisAIExecutionProvider
    session_options.AppendExecutionProvider("VitisAI", options);
    Ort::Session session(env, model_path, session_options);

    // Load input (bin/raw assumed)
    std::vector<float> input_data;
    std::vector<int64_t> input_shape={1,3,224,224}; // Fill manually
    if (!load_raw_float(input_file, input_data, input_shape)) {
        std::cerr << "Failed to load input file" << std::endl;
        return 1;
    }

    // Create tensor
    Ort::MemoryInfo mem_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
    Ort::Value input_tensor = Ort::Value::CreateTensor<float>(
        mem_info, input_data.data(), input_data.size(),
        input_shape.data(), input_shape.size()
    );

    // Names
    Ort::AllocatorWithDefaultOptions allocator;
    auto input_name = session.GetInputNameAllocated(0, allocator);
    const char* input_names[] = { input_name.get() };

    size_t num_outputs = session.GetOutputCount();
    std::vector<const char*> output_names;
    std::vector<Ort::AllocatedStringPtr> output_name_ptrs;
    for (size_t i = 0; i < num_outputs; i++) {
        output_name_ptrs.emplace_back(session.GetOutputNameAllocated(i, allocator));
        output_names.push_back(output_name_ptrs.back().get());
    }

    // Run inference
    auto output_tensors = session.Run(
        Ort::RunOptions{nullptr},
        input_names, // const char* const*
        &input_tensor, // const Ort::Value*
        1, // input num
        output_names.data(), // const char* const*
        num_outputs  //input num
    );

    // Save first output (as raw binary)
    float* out_data = output_tensors[0].GetTensorMutableData<float>();
    size_t out_size = output_tensors[0].GetTensorTypeAndShapeInfo().GetElementCount();
    std::vector<float> output_vec(out_data, out_data + out_size);

    if (!save_raw_float(output_file, output_vec)) {
        std::cerr << "Failed to save output" << std::endl;
        return 1;
    }

    std::cout << "Saved to " << output_file << std::endl;
    return 0;
}

Note:

  • Include header file onnxruntime_cxx_api.h. This file is located in $SDKTARGETSYSROOT/usr/include/onnxruntime/core/session directory.

  • Ensure that the cacheDir and cacheKey options match the results from your Vitis AI model compilation.

C++ Host Application Compilation and Linking#

To compile and link the C++ application, use the following steps:

1. Set up SDK:#

Refer to Install Sysroot to install the sysroot and set up the cross-compilation environment.

2. Compile the Application#

Navigate to your working directory and compile the application using the g++ compiler. This command invokes a cross-compiler for the ARM64 architecture (aarch64-amd-linux-g++), targeting a Xilinx platform based on Cortex-A78 cores. The input source file input.cpp is compiled into an object file input.o.

$CXX -I$SDKTARGETSYSROOT/usr/include -I$SDKTARGETSYSROOT/usr/include/onnxruntime/core/session  -O2 -pipe -g -feliminate-unused-debug-types  -o input.o -c ./input.cpp

Some important flags used here are

  • -I…: Add specified directories to the header file search path.