Configuring NoC Connectivity for Model Deployments#
Overview#
Vitis AI inference deployments on Versal AI Edge Series Gen2 devices require a carefully
trained Network-on-Chip (NoC) configuration to meet the bandwidth and
latency demands of AIE workloads. However, because the actual
AI model artifact is loaded dynamically at runtime, the Vitis v++
linker cannot derive NoC configuration from the model itself at build time.
The gmio_train utility bridges this gap by generating a lightweight,
parameterized Neural Processing Unit (NPU) artifact – referred to as a
dummy graph – that exposes the Global Memory IO (GMIO) interface
placement and per-channel bandwidth requirements that the NoC compiler
needs during the link stage.
This section describes how to use gmio_train to produce and customize
this dummy NPU artifact, how to compile and link it into a Xilinx Shell
Archive (XSA) using aiecompiler and v++, and how to align the
trained NoC Quality of Service (QoS) profile with the real data-flow
requirements of the target system.
Introduction#
The Vitis v++ linker relies on an AIE-side artifact to expose GMIO
endpoints and per-channel bandwidth requests before it can invoke the NoC
compiler and derive a valid NoC configuration. In a typical development
flow, the real compiled AI model would serve this purpose. However, because
Vitis AI compiled models are loaded and configured dynamically at runtime – and may
change between runs – tying NoC training to any specific model artifact
introduces fragility and inflexibility into the build process.
gmio_train solves this problem by decoupling NoC configuration from any
particular model. It generates a stand-in ADF graph that carries only the
GMIO placement and bandwidth information required for NoC training, without
encoding any model-specific compute logic. This dummy graph is used
exclusively at build time to guide the NoC compiler; at runtime, it is
transparently replaced by the actual compiled AI model artifact running on
the AIE Array.
By default, gmio_train produces a uniform layout in which every column
on the AIE Array exposes two input and two output GMIO interfaces, each
configured with a default bandwidth of 500 MB/s. This default configuration
is well-suited for platform bring-up and early-stage development. For
production deployments, the generated graph can be customized to reflect
the actual column range and per-interface bandwidth profile of the target
model, ensuring that the trained NoC configuration accurately mirrors
real-world traffic patterns.
The sections that follow describe the default invocation, customization options, compilation steps, and recommended practices for both single-model and multi-model system configurations.
Why Use gmio_train?#
Configuring the NoC in a Versal device requires the
v++ linker to have access to an AI Engine (AIE)-side artifact at build
time – one that exposes the GMIO endpoints and
per-channel bandwidth requests that the NoC compiler uses to derive a valid
NoC configuration. In practice, however, the actual compiled AI model
artifact is not always available at link time, and even when it is, it may
change between runs or be loaded dynamically at runtime. This creates a
fundamental tension between the static requirements of the build system and
the dynamic nature of AI model deployment.
gmio_train resolves this by generating a dummy Neural
Processing Unit (NPU) artifact – a lightweight stand-in Adaptive Data Flow
(ADF) graph – that carries only the GMIO placement and bandwidth
information required for NoC training. It does not encode any model-specific
compute logic and is never executed at runtime. Its sole purpose is to give
the NoC compiler a well-defined, realistic set of Quality of Service (QoS)
parameters to train against during the v++ link stage.
The following points summarize the key reasons to use gmio_train:
- The v++ link stage requires an AIE-side artifact.
The NoC compiler cannot derive a NoC configuration in the absence of an artifact that exposes GMIO endpoints and per-channel bandwidth requests. Without such an artifact, NoC training has nothing to work against, and the link stage will fail or produce an undertrained NoC configuration that may not meet the bandwidth and latency requirements of the target workload.
- Vitis AI models are loaded and configured dynamically at runtime.
The real Vitis AI model artifact may change from one run to another, or may not be available at all during the platform build stage. Tying NoC training to a specific model artifact would require rebuilding the platform every time the model changes.
gmio_traindecouples NoC configuration from any specific model by producing, at build time, a parameterized stand-in graph that only carries the GMIO placement and bandwidth information needed for NoC training.- It eliminates boilerplate and reduces authoring effort.
Although a dummy ADF graph can be written by hand, doing so requires familiarity with the ADF graph API and involves significant boilerplate code.
gmio_trainautomates this process and produces a parameterized, easy-to-customize template that can be adapted to the target column range and bandwidth profile with minimal effort. Users who prefer full control over the graph definition are still free to author their own dummy graph;gmio_traincan serve as a convenient and well-structured starting point in either case.
Note
The dummy NPU graph produced by gmio_train is used exclusively at
build time to guide NoC configuration. It is not executed at runtime and
does not affect the behavior of the actual AI model artifact that runs
on the AIE Array.
Note
Before running any of the commands shown in this document
(gmio_train, aiecompiler, v++, …), make sure the Vitis
environment is set up as described in
Build the Reference Design.
The default invocation is:
gmio_train -s --part xc2ve3858-ssva2112-2MP-e-S -o training-libadf.a
This default command generates a design in which every column on the AI
Engine Array exposes two input and two output GMIO interfaces, and where the
default read bandwidth and write bandwidth are both set to 500 MB/s.
Using a User-Defined Dummy NPU Artifact#
Although gmio_train is the recommended approach for generating the
dummy Neural Processing Unit (NPU) artifact, it is not the only option.
Users who require precise control over Global Memory IO (GMIO) interface
placement, bandwidth assignments, or graph topology are free to author
their own dummy Adaptive Data Flow (ADF) graph and pass it directly to
the v++ linker in place of the gmio_train-generated artifact.
A user-defined dummy graph must satisfy the same structural requirements
as the gmio_train-generated graph: it must expose the GMIO endpoints
and per-channel bandwidth requests that the NoC compiler
needs in order to derive a valid NoC configuration at link time. Beyond
that constraint, the graph definition is entirely under the user’s control.
When authoring a custom dummy graph, consider the following:
- GMIO placement must reflect the intended model layout.
The GMIO interface locations defined in the dummy graph directly influence the NoC configuration that the compiler derives. If the placement does not reflect the column range and interface locations of the actual AI model, the resulting NoC configuration may be suboptimal or incompatible with the runtime workload.
- Bandwidth values must be non-zero.
As noted in subsequent sections, bandwidth values of
0are not supported in either the GMIOcreate()calls or the Vitis NoC Quality of Service (QoS) connectivity options. All per-interface bandwidth values must be set to a positive integer. For paths that are unused or minimally active, a small non-zero value in the range of 1 to 5 MB/s is recommended to keep the configuration valid while minimizing impact on NoC resource allocation.- gmio_train can serve as a starting point.
Even when a fully custom graph is required,
gmio_traincan accelerate development by generating a well-structured, parameterized template that is straightforward to modify. Starting from thegmio_train-generatedgraph.hand adapting it to the target layout is generally faster and less error-prone than writing a dummy graph from scratch, particularly for users who are less familiar with the ADF graph API.
Note
Regardless of whether the dummy NPU artifact is produced by
gmio_train or authored manually, it is used exclusively at build
time to guide NoC configuration during the v++ link stage. It is
not executed at runtime and does not affect the behavior of the actual
compiled AI model artifact that runs on the AIE Array.
Customizing the Generated Graph#
Before customizing the dummy Neural Processing Unit (NPU) graph generated
by gmio_train, the user should have a working understanding of the
target model’s deployment characteristics. Specifically, the following
information should be known or estimated before making any modifications
to the generated graph.h:
Which columns on the AIE Array the actual AI model will occupy
The total number of columns that will be used
The location of each input and output interface within those columns
The bandwidth requirement of each interface – at minimum, the relative magnitudes of the input and output bandwidths across columns should be known, even if exact values are not yet available
With this information in hand, the graph generated by gmio_train can
be modified to accurately reflect the target model’s column range and
traffic profile. The following subsections describe how to configure the
column range and per-interface bandwidth values.
Configuring the Column Range#
By default, gmio_train generates a design in which every column on the
AIE Array is included. To restrict the generated graph to the columns that
the actual AI model will occupy, use the --start-col and --num-col
options to specify the starting column index and the total number of
columns, respectively.
For example, the following command generates a design that starts at column 0 and occupies 8 columns:
gmio_train -s --part xc2ve3858-ssva2112-2MP-e-S \
--ns instance_0 --start-col 0 --num-col 8 \
-o training-libadf.a
The generated tmp/aie/graph.h will reflect this column range and
produce Global Memory IO (GMIO) interface entries only for the specified
columns. The default graph structure produced by this command is shown
below:
class Shimlock : public adf::graph
{
public:
adf::kernel gmioKernSet[NMU_COL_COUNT * STACK_DEPTH];
adf::input_gmio gmioIn[NMU_COL_COUNT * STACK_DEPTH];
adf::output_gmio gmioOut[NMU_COL_COUNT * STACK_DEPTH];
Shimlock()
{
for(int c = 0; c < NMU_COL_COUNT; c++) {
for(int i = 0; i < STACK_DEPTH; i++) {
gmioIn[STACK_DEPTH * c + i] = adf::input_gmio::create(
"c" + std::to_string(nmuCols.at(c)) + "r" + std::to_string(i),
256, 500);
gmioOut[STACK_DEPTH * c + i] = adf::output_gmio::create(
"c" + std::to_string(nmuCols.at(c)) + "w" + std::to_string(i),
256, 500);
gmioKernSet[STACK_DEPTH * c + i] = adf::kernel::create(loop);
adf::location<adf::kernel>(gmioKernSet[STACK_DEPTH * c + i]) =
adf::tile(nmuCols.at(c), i);
adf::location<adf::GMIO>(gmioIn[STACK_DEPTH * c + i]) =
adf::shim(nmuCols.at(c));
adf::location<adf::GMIO>(gmioOut[STACK_DEPTH * c + i]) =
adf::shim(nmuCols.at(c));
adf::source(gmioKernSet[STACK_DEPTH * c + i]) = "./loop.cpp";
adf::connect(gmioIn[STACK_DEPTH * c + i].out[0],
gmioKernSet[STACK_DEPTH * c + i].in[0]);
adf::connect(gmioKernSet[STACK_DEPTH * c + i].out[0],
gmioOut[STACK_DEPTH * c + i].in[0]);
adf::runtime<adf::ratio>(gmioKernSet[STACK_DEPTH * c + i]) = 1.0;
}
}
}
};
In this default structure, all GMIO interfaces are assigned a uniform
bandwidth of 500 MB/s for both input and output. This is appropriate
for bring-up and early-stage development, but should be replaced with
per-interface values that reflect the actual traffic profile of the target
model before moving to production.
Configuring Per-Interface Bandwidth Values#
To assign per-interface bandwidth values, declare two integer arrays –
one for input bandwidths and one for output bandwidths – and reference
them in the adf::input_gmio::create() and
adf::output_gmio::create() calls, replacing the uniform 500
default. The following example illustrates this pattern:
class Shimlock : public adf::graph
{
public:
adf::kernel gmioKernSet[NMU_COL_COUNT * STACK_DEPTH];
adf::input_gmio gmioIn[NMU_COL_COUNT * STACK_DEPTH];
adf::output_gmio gmioOut[NMU_COL_COUNT * STACK_DEPTH];
int bwIn[NMU_COL_COUNT*STACK_DEPTH] = {READ_0, READ_1, READ_2, ...};
int bwOut[NMU_COL_COUNT*STACK_DEPTH] = {WRITE_0, WRITE_1, WRITE_2, ...};
Shimlock()
{
for(int c = 0; c < NMU_COL_COUNT; c++) {
for(int i = 0; i < STACK_DEPTH; i++) {
gmioIn[STACK_DEPTH * c + i] = adf::input_gmio::create(
"c" + std::to_string(nmuCols.at(c)) + "r" + std::to_string(i),
256, bwIn[c*STACK_DEPTH + i]);
gmioOut[STACK_DEPTH * c + i] = adf::output_gmio::create(
"c" + std::to_string(nmuCols.at(c)) + "w" + std::to_string(i),
256, bwOut[c*STACK_DEPTH + i]);
gmioKernSet[STACK_DEPTH * c + i] = adf::kernel::create(loop);
adf::location<adf::kernel>(gmioKernSet[STACK_DEPTH * c + i]) =
adf::tile(nmuCols.at(c), i);
adf::location<adf::GMIO>(gmioIn[STACK_DEPTH * c + i]) =
adf::shim(nmuCols.at(c));
adf::location<adf::GMIO>(gmioOut[STACK_DEPTH * c + i]) =
adf::shim(nmuCols.at(c));
adf::source(gmioKernSet[STACK_DEPTH * c + i]) = "./loop.cpp";
adf::connect(gmioIn[STACK_DEPTH * c + i].out[0],
gmioKernSet[STACK_DEPTH * c + i].in[0]);
adf::connect(gmioKernSet[STACK_DEPTH * c + i].out[0],
gmioOut[STACK_DEPTH * c + i].in[0]);
adf::runtime<adf::ratio>(gmioKernSet[STACK_DEPTH * c + i]) = 1.0;
}
}
}
};
Note
Replace READ_0, READ_1, READ_2, and WRITE_0,
WRITE_1, WRITE_2 with the actual per-interface bandwidth
values, in MB/s, that reflect the traffic profile of the target
model. At minimum, the relative magnitudes of the input and output
bandwidths across columns should be preserved, even if exact values
are not yet available.
Bandwidth Value Constraints#
When assigning per-interface bandwidth values, the following constraints must be observed:
- Bandwidth values of 0 are not supported.
A value of
0is not valid in either the GMIOcreate()calls or the Vitis NoC Quality of Service (QoS) connectivity options. This behavior differs from Vivado NoC configuration, where a value of0can be used to indicate unused paths. In the Vitis flow, attempting to set:read_bw = 0 write_bw = 0
can result in linker errors similar to the following:
ERROR: [CFGEN 83-2253] Malformed --connectivity.noc.read_bw switch argument
- Use small non-zero values for unused or minimally active paths.
For interfaces that are unused or carry minimal traffic, assign a small positive bandwidth value in the range of 1 to 5 MB/s. This keeps the configuration valid while minimizing the impact on NoC resource allocation.
- Preserve relative bandwidth magnitudes across columns.
Even when exact per-interface bandwidth values are not known, the relative magnitudes of the input and output bandwidths across columns should be preserved as accurately as possible. The NoC compiler uses these values to derive a Quality of Service (QoS) profile that reflects the real data-flow requirements of the target model. A configuration in which all interfaces are assigned the same uniform bandwidth may result in a suboptimal NoC configuration that does not meet the latency or throughput requirements of the target workload.
Note
The bandwidth values assigned in the dummy graph are used exclusively
at build time to guide NoC training during the v++ link stage.
They do not directly control the runtime behavior of the actual AI
model artifact. However, they do influence the NoC configuration that
is baked into the platform, which in turn affects the bandwidth and
latency characteristics available to the runtime workload. For this
reason, it is important to assign bandwidth values that are as
representative of the real model’s traffic profile as possible.
Compiling the Customized Graph#
After editing tmp/aie/graph.h to reflect the target column range and
per-interface bandwidth values, the customized graph must be recompiled
to produce an updated training-libadf.a archive. The files generated
by gmio_train can be reused directly for this purpose, without
regenerating the full gmio_train output.
To recompile the customized graph, run aiecompiler with the
configuration file and include path generated by gmio_train:
aiecompiler --config tmp/aie/Work/aie_hw.cfg --include=tmp/aie
This command produces an updated training-libadf.a archive that
reflects the customized GMIO placement and bandwidth values. Once
compiled, pass the archive to the v++ linker to produce the linked
Xilinx Shell Archive (XSA):
v++ -l training-libadf.a ... -o <design>_link.xsa
The v++ linker will invoke the NoC compiler using the GMIO placement
and bandwidth information encoded in training-libadf.a to derive a
NoC configuration that reflects the target model’s column range and
traffic profile.
Note
Before running aiecompiler or v++, ensure that the Vitis
environment is set up as described in Build the Reference Design. Attempting to run
either tool without the correct environment configuration will result
in errors or an incomplete build.
Recommended Practice#
The following practices are recommended when configuring the dummy Neural Processing Unit (NPU) graph for NoC training. Adhering to these guidelines will help ensure that the trained NoC configuration is both valid and representative of the real model’s data-flow requirements.
- Use small non-zero values for unused or minimally active paths.
Bandwidth values of
0are not supported in the Vitis NoC Quality of Service (QoS) connectivity options. For Global Memory IO (GMIO) interfaces that are unused or carry minimal traffic, assign a small positive bandwidth value in the range of 1 to 5 MB/s. This keeps the NoC configuration valid while minimizing the impact on NoC resource allocation for those paths.- Reflect the actual column range of the target model.
Use the
--start-coland--num-coloptions to restrict the generated graph to the columns that the actual AI model will occupy. Including columns that the model does not use will cause the NoC compiler to allocate resources for paths that will never carry traffic at runtime, potentially degrading the NoC configuration quality for the paths that matter.- Preserve relative bandwidth magnitudes across columns.
Even when exact per-interface bandwidth values are not yet known, the relative magnitudes of the input and output bandwidths across columns should be preserved as accurately as possible. The NoC compiler uses these values to derive a QoS profile that reflects the real data-flow requirements of the target model. A uniform bandwidth assignment across all interfaces may result in a suboptimal NoC configuration that does not meet the latency or throughput requirements of the target workload.
- Recompile after every graph modification.
Any change to
tmp/aie/graph.h– whether to the column range, interface placement, or bandwidth values – must be followed by a recompile usingaiecompilerand a re-link usingv++to ensure that the updated NoC configuration is reflected in the output Xilinx Shell Archive (XSA). Changes tograph.hthat are not recompiled and re-linked will have no effect on the trained NoC configuration.- Use gmio_train defaults during early bring-up.
When per-model column knowledge is not yet available – for example, during early platform bring-up or initial integration testing – the default
gmio_traininvocation with uniform 500 MB/s bandwidth across all columns is a suitable and safe starting point. Refine the configuration once the actual model’s column usage and traffic profile have been characterized.
Performance-Oriented Configuration#
The default gmio_train graph, which assigns a uniform bandwidth of
500 MB/s to every column on the AIE Array, is designed for
platform bring-up and generic NoC training. It is not intended for
performance tuning. Once the actual machine learning (ML) workload has
been characterized and its column usage and traffic profile are known,
the dummy graph should be regenerated and customized to reflect the real
model’s data-flow requirements.
The following steps describe the recommended approach for performance-oriented NoC configuration:
- Step 1: Match the column range to the actual model.
Use the
--start-coland--num-coloptions to restrict the generated graph to the columns that the model actually occupies. Columns that the model does not use should be excluded from the dummy graph to avoid allocating NoC resources for unused paths.For example, if the model occupies columns 0 through 7:
gmio_train -s --part xc2ve3858-ssva2112-2MP-e-S \ --ns instance_0 --start-col 0 --num-col 8 \ -o training-libadf.a- Step 2: Replace uniform bandwidth values with per-column estimates.
Replace the uniform
500MB/s default with per-columnREAD_xandWRITE_xvalues that approximate the model’s traffic pattern. At minimum, the relative magnitudes of the input and output bandwidths across columns should be preserved, even if exact values are not yet available. Refer to the Configuring Per-Interface Bandwidth Values section for implementation details.- Step 3: Recompile and re-link.
After customizing
tmp/aie/graph.h, recompile the graph usingaiecompilerand re-link usingv++to produce an updated XSA that reflects the performance-oriented NoC configuration:aiecompiler --config tmp/aie/Work/aie_hw.cfg --include=tmp/aie v++ -l training-libadf.a ... -o <design>_link.xsa
- Step 4: Complement with v++ connectivity options.
For finer control over NoC behavior, supplement the dummy graph with
v++ --connectivityoptions such assp=,noc.read_bw=, andnoc.write_bw=to control how kernels, AIE interfaces, and memory resources are interconnected and assigned QoS parameters. Refer to the Further NoC Control Beyond gmio_train section for details.
Note
When per-model column knowledge is not yet available, the plain
gmio_train defaults remain a suitable starting point. Performance-
oriented configuration should be deferred until the actual model’s
column usage and traffic profile have been characterized through
profiling or simulation.
Per-Instance NoC Customization#
In deployments where multiple model instances run concurrently on the AIE Array, the dummy NoC training graph must reflect the full multi-instance floorplan to ensure that the trained NoC configuration meets the bandwidth and latency requirements of all active instances. Multiple model instances can be deployed on the AIE Array through two distinct mechanisms, each operating at a different stage of the development flow:
Mechanism |
When |
Option |
Use Case |
|---|---|---|---|
Data parallelism |
Compile time |
|
Maximize throughput for concurrent requests by replicating the model multiple times across the device at build time |
Multi-tenancy |
Runtime |
|
Dynamically place multiple model instances across AIE Array columns at runtime using per-runner placement options |
Regardless of which mechanism is used, the NoC training graph should be configured to reflect the column range and per-interface bandwidth profile of each model instance. The following guidance applies to both mechanisms.
- Generate one dummy graph per model instance.
Invoke
gmio_trainseparately for each model instance, using the--start-coland--num-coloptions to restrict each graph to the columns that the instance will occupy, and the--nsoption to assign a unique namespace to each graph to avoid symbol conflicts during linking.The total number of columns occupied by each compiled model instance is determined by the following relationship:
\[\text{occupied\_columns} = dp\_size \times tp\_size \times 4\]For example, for a deployment where each instance occupies 8 columns, generate one dummy graph per instance in different working directories as follows:
gmio_train -s --part xc2ve3858-ssva2112-2MP-e-S \ --ns instance_0 --start-col 0 --num-col 8 \ -o training-libadf-instance-0.a gmio_train -s --part xc2ve3858-ssva2112-2MP-e-S \ --ns instance_1 --start-col 8 --num-col 8 \ -o training-libadf-instance-1.a gmio_train -s --part xc2ve3858-ssva2112-2MP-e-S \ --ns instance_2 --start-col 16 --num-col 8 \ -o training-libadf-instance-2.a gmio_train -s --part xc2ve3858-ssva2112-2MP-e-S \ --ns instance_3 --start-col 24 --num-col 8 \ -o training-libadf-instance-3.a- Assign per-instance bandwidth values.
For each generated
graph.h, replace the uniform bandwidth defaults withREAD_xandWRITE_xvalues that reflect the traffic profile of the corresponding model instance. This ensures that the NoC compiler trains against a realistic, per-instance Quality of Service (QoS) profile that mirrors the final multi-instance floorplan.- Compile each instance into its own archive.
In each working directory, compile each customized
graph.hinto its ownlibadf.aarchive usingaiecompiler:aiecompiler --config tmp/aie/Work/aie_hw.cfg \ --include=tmp/aie aiecompiler --config tmp/aie/Work/aie_hw.cfg \ --include=tmp/aie aiecompiler --config tmp/aie/Work/aie_hw.cfg \ --include=tmp/aie aiecompiler --config tmp/aie/Work/aie_hw.cfg \ --include=tmp/aie- Link all instance archives together with v++.
Pass all per-instance archives to the
v++linker in a single link invocation. The linker will combine the Global Memory IO (GMIO) placement and bandwidth information from all archives and invoke the NoC compiler to derive a unified NoC configuration that reflects the full multi-instance floorplan:v++ -l training-libadf-instance-0.a \ training-libadf-instance-1.a \ training-libadf-instance-2.a \ training-libadf-instance-3.a \ ... -o <design>_link.xsa .. note::When using per-instance NoC customization, observe the following constraints regardless of whether compile-time or runtime placement is used:
Column ranges must not overlap. The
--start-colranges assigned to each model instance must be non-overlapping. On a 24-column device (ve2-xc2ve3558), the sum of all occupied column ranges must not exceed 24 columns. On a 36-column device (ve2-xc2ve3858), the sum must not exceed 36 columns.Column ranges must represent disjoint partitions. The column ranges assigned across all model instances must together form a set of disjoint partitions – that is, every occupied column must belong to exactly one instance, with no gaps or overlaps between instance boundaries. Partial or fragmented column assignments that leave unassigned columns between instances are not supported.
One instance must include column 0. At least one model instance must be assigned a column range that begins at column 0 (
--start-col 0). The NoC compiler requires that the column space is anchored at column 0; a configuration in which no instance starts at column 0 is invalid and will produce an incorrect or incomplete NoC configuration.Namespaces must be unique. The
--nsoption must be used to assign a unique namespace to each generated graph to avoid symbol conflicts during thev++link stage.Bandwidth values must be non-zero. For unused or minimally active paths, assign a small positive value in the range of 1 to 5 MB/s. Bandwidth values of
0are not supported in the Vitis NoC QoS connectivity options.Runtime placement must be consistent. When using multi- tenancy runtime placement, VART does not validate overlapping or incompatible placements across runners. The application is responsible for ensuring that spatial layouts do not overlap and that temporal sharing groups use matching
dp_sizeandtp_sizevalues. Refer to the VART Multi-Tenancy documentation for full details on runner placement options and application responsibilities.
Further NoC Control Beyond gmio_train#
The dummy graph produced by gmio_train describes only the AIE-side GMIO interfaces and their associated
bandwidth requests. While this information is sufficient for the
NoC compiler to derive a baseline NoC configuration,
the actual NoC behavior in the final system is shaped by additional
Vitis and Vivado mechanisms that can be combined with gmio_train
to achieve finer control over NoC resource allocation and Quality of
Service (QoS) parameters.
The following mechanisms are available for extending NoC control beyond
what gmio_train provides:
Vitis Linker Connectivity Options#
The v++ linker exposes a set of --connectivity options that
allow users to control how kernels, AIE interfaces, and memory resources
are interconnected and assigned QoS parameters. These options complement
the GMIO placement and bandwidth information provided by the
gmio_train dummy graph and can be used to fine-tune the NoC
configuration without modifying the graph itself.
The most commonly used connectivity options for NoC control are:
Option |
Description |
|---|---|
|
Specifies the memory resource to which a kernel port or AIE interface is connected. Used to control the data path between compute resources and memory. |
|
Specifies the read bandwidth, in MB/s, for a given NoC path. Overrides the bandwidth value derived from the dummy graph for that path. |
|
Specifies the write bandwidth, in MB/s, for a given NoC path. Overrides the bandwidth value derived from the dummy graph for that path. |
For full details on all available connectivity options, refer to:
UG1702 - Vitis Reference Guide (Connectivity Options): https://docs.amd.com/r/en-US/ug1702-vitis-accelerated-reference/connectivity-Options
Post-Link Customization in Vivado#
For advanced use cases where the Vitis linker connectivity options do not provide sufficient control, the linked design can be exported to Vivado and the NoC settings tuned directly in the Vivado environment. The customized design can then be carried back into the Vitis flow for final integration and deployment.
Post-link customization in Vivado is recommended in the following scenarios:
The NoC configuration derived by the
v++linker does not meet the bandwidth or latency requirements of the target workload, and the required adjustments cannot be expressed throughv++connectivity options alone.The target system has complex NoC topology requirements that are more naturally expressed in the Vivado NoC configuration environment than through the
v++connectivity option syntax.Fine-grained control over individual NoC path parameters – such as traffic class, QoS priority, or arbitration settings – is required beyond what the Vitis flow exposes.
Note
Post-link customization in Vivado is an advanced workflow and
requires familiarity with both the Vivado NoC configuration
environment and the Vitis platform development flow. Users who are
new to Versal NoC configuration are encouraged to exhaust the
gmio_train and v++ connectivity options before resorting
to post-link Vivado customization.
For a comprehensive description of the Versal NoC architecture, configuration parameters, and QoS tuning, refer to:
PG313 - Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide: https://docs.amd.com/r/en-US/pg313-network-on-chip