ONNX Runtime C++ APIs#
ONNX Runtime is a high-performance engine for running deep learning models, supporting both inference and training execution. It provides user-friendly C and C++ APIs for integrating precompiled models into your application workflows.The C++ runtime application and workflow share similarities with the Python implementation, making it easier for users to transition between the two languages.
By compiling models with AMD Vitis™ AI, you can leverage ONNX Runtime’s C++ APIs to deploy those models on hardware and perform inference efficiently. The following example demonstrates a typical inference workflow using ONNX Runtime C++ APIs with the Vitis AI execution provider.
For further details about ONNX Runtime C++ APIs, refer to the official documentation: https://onnxruntime.ai/docs/api/c/c_cpp_api.html
Example: Inference Workflow with Vitis AI#
#include <onnxruntime_cxx_api.h>
#include <iostream>
#include <vector>
#include <chrono>
#include <random>
#include <fstream>
// Read raw data
bool load_raw_float(const std::string& filename, std::vector<float>& data, std::vector<int64_t>& shape) {
std::ifstream f(filename, std::ios::binary);
if (!f) return false;
f.seekg(0, std::ios::end);
size_t size = f.tellg();
f.seekg(0, std::ios::beg);
data.resize(size / sizeof(float));
f.read(reinterpret_cast<char*>(data.data()), size);
return true;
}
bool save_raw_float(const std::string& filename, const std::vector<float>& data) {
std::ofstream f(filename, std::ios::binary);
if (!f) return false;
f.write(reinterpret_cast<const char*>(data.data()), data.size() * sizeof(float));
return true;
}
int main(int argc, char* argv[]) {
std::cout << "Usage: " << argv[0] << std::endl;
const char* model_path = "/etc/vai/models/resnet50_int8/resnet50_int8.onnx";
std::string input_file = "/etc/vai/models/resnet50_int8/data/ifm_input_fp32_1x3x224x224.bin";
std::string output_file = "output_fp32_1x1000.bin";
// Env + session
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "Default");
Ort::SessionOptions session_options;
// Set VitisAI-specific options
std::unordered_map<std::string, std::string> options;
options["config_file"] = "/etc/vai/models/resnet50_int8/vitisai_config.json"; // Config file
options["cacheDir"] = "/etc/vai/models"; // Cache dir used to compile the model, should match compilation result.
options["cacheKey"] = "resnet50_int8"; // Cache key used to compile the model, should match compilation result.
options["target"] = "VAIML"; // Target Platform
// ORT Session with VitisAIExecutionProvider
session_options.AppendExecutionProvider("VitisAI", options);
Ort::Session session(env, model_path, session_options);
// Load input (bin/raw assumed)
std::vector<float> input_data;
std::vector<int64_t> input_shape={1,3,224,224}; // Fill manually
if (!load_raw_float(input_file, input_data, input_shape)) {
std::cerr << "Failed to load input file" << std::endl;
return 1;
}
// Create tensor
Ort::MemoryInfo mem_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
Ort::Value input_tensor = Ort::Value::CreateTensor<float>(
mem_info, input_data.data(), input_data.size(),
input_shape.data(), input_shape.size()
);
// Names
Ort::AllocatorWithDefaultOptions allocator;
auto input_name = session.GetInputNameAllocated(0, allocator);
const char* input_names[] = { input_name.get() };
size_t num_outputs = session.GetOutputCount();
std::vector<const char*> output_names;
std::vector<Ort::AllocatedStringPtr> output_name_ptrs;
for (size_t i = 0; i < num_outputs; i++) {
output_name_ptrs.emplace_back(session.GetOutputNameAllocated(i, allocator));
output_names.push_back(output_name_ptrs.back().get());
}
// Run inference
auto output_tensors = session.Run(
Ort::RunOptions{nullptr},
input_names, // const char* const*
&input_tensor, // const Ort::Value*
1, // input num
output_names.data(), // const char* const*
num_outputs //input num
);
// Save first output (as raw binary)
float* out_data = output_tensors[0].GetTensorMutableData<float>();
size_t out_size = output_tensors[0].GetTensorTypeAndShapeInfo().GetElementCount();
std::vector<float> output_vec(out_data, out_data + out_size);
if (!save_raw_float(output_file, output_vec)) {
std::cerr << "Failed to save output" << std::endl;
return 1;
}
std::cout << "Saved to " << output_file << std::endl;
return 0;
}
Note:
Include header file onnxruntime_cxx_api.h. This file is located in $SDKTARGETSYSROOT/usr/include/onnxruntime/core/session directory.
Ensure that the cacheDir and cacheKey options match the results from your Vitis AI model compilation.
C++ Host Application Compilation and Linking#
To compile and link the C++ application, use the following steps:
1. Set up SDK:#
Refer to Install Sysroot to install the sysroot and set up the cross-compilation environment.
2. Compile the Application#
Navigate to your working directory and compile the application using the g++ compiler. This command invokes a cross-compiler for the ARM64 architecture (aarch64-amd-linux-g++), targeting a Xilinx platform based on Cortex-A78 cores. The input source file input.cpp is compiled into an object file input.o.
$CXX -I$SDKTARGETSYSROOT/usr/include -I$SDKTARGETSYSROOT/usr/include/onnxruntime/core/session -O2 -pipe -g -feliminate-unused-debug-types -o input.o -c ./input.cpp
Some important flags used here are
-I…: Add specified directories to the header file search path.
3. Link the Application#
You can then link the previously created object file (input.o) into an executable, named model-app.elf. This command includes the ONNX Runtime library (-lonnxruntime) and sets up linking path and runtime path.
$CXX -O2 -pipe -g -feliminate-unused-debug-types -Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed -Wl,-z,relro,-z,now -rdynamic "input.o" -o model-app.elf -L$SDKTARGETSYSROOT/usr/lib -Wl,-rpath,$SDKTARGETSYSROOT/usr/lib -lonnxruntime
Some important flags used here are:
-L…: Add to library search path.
-Wl,-rpath,…: Sets the runtime library path in the executable, so it knows where to find shared libraries.
-lonnxruntime: Links against the ONNX Runtime library.
The example above uses the pre-built ResNet-50 model included in the hardware image. Following this process generates the binaries required for your C++ application, which can then be deployed and executed directly on the target hardware.
Note
For additional ONNX Runtime C++ reference applications, see CPP VART Examples.