Tom Hackett — Cadence
Today's always-on, cloud-connected electronic products create value by delivering massive amounts of data on time to serve a wide variety of applications. Whether streaming video, transacting e-commerce, or processing a voice over IP (VoIP) call, moving data in bulk (bandwidth) and delivering it on time (latency) are major system priorities.
System-on-chip (SoC) architects strive to create architectures that deliver optimum bandwidth and latency for these and countless other applications. Meeting bandwidth and latency requirements is a priority for mobile and wired products, as well as consumer and business applications. It is a central, perhaps the central, SoC design challenge. It requires the attention of the whole SoC development team-design engineers and verification engineers as well as architects. Also, it requires a new method of analysis called performance analysis. Creating a performance analysis solution requires a combination of design and verification tools along with specialized intellectual property (IP). There are five requirements that must be satisfied to implement an effective performance analysis solution. Those requirements are the five keys for optimizing SoC latency and bandwidth.
First, let's examine the three major forces driving SoC architectures today:
"Custom interface standards" refers to the recent proliferation of specialized variants of common interface protocols. In order to optimize products for various market segments, common interface standards such as PCI Express, USB, and DDR memory protocols have spun off multiple customized variations. This article describes the phenomenon.
Multi-core processors are now in virtually every electronic product. Multi-core designs deliver a fundamental architectural advantage to enable higher performance with lower power. Whether it's the dual-core chip in the iPhone 5s, or 16-core server-grade SoCs, multi-core design is everywhere.
The third element, advanced SoC interconnect IP, is often overlooked but absolutely essential to realizing bandwidth and latency goals. While simple busses were previously used to connect IP blocks in an SoC, multi-core designs require advanced IP components to connect blocks. Sometimes called fabrics, networks-on-chip, or simply "interconnects", this class of IP helps to tune the operation of multi-core SoCs.
As with any sophisticated technology, there are multiple challenges to successfully implementing advanced interconnects. This article describes the challenges entailed in performing pre-silicon functional verification of multi-core mobile SoCs. Similar issues exist with server designs.
While functional verification is essential, it does not address bandwidth and latency measurement. To verify that bandwidth and latency goals are met, a new type of verification approach, performance verification, is required. To understand performance verification, consider the differing bandwidth and latency demands of various components within an SoC as depicted in the figure below.
Figure 1. Differing bandwidth and latency requirements of various SoC components
In this example, some components have a maximum latency requirement, others require a guaranteed minimum bandwidth, and some a combination of both. The best architectural solution for any given SoC is that which connects the components with an interconnect IP configuration that achieves the bandwidth and latency requirements of all the components. This may involve a hierarchical interconnect arrangement as shown in figure 2.
Figure 2. Example SoC with hierarchical advanced interconnect
In this example, the SoC is segmented into multiple domains, each served by a local interconnect. This helps to segment traffic but contentions will commonly arise - as shown in the example where the CPU, display subsystem, and a peripheral all try to access main memory simultaneously.
Advanced interconnect IP provide a number of features to tune each port for unique bus widths, address maps, and clock speed. Also, mechanisms for adjusting bandwidth and latency, referred to as quality of service (QoS) control, are used to tune the interconnect IP used in each domain.
While the QoS flexibility is needed, the end-to-end result of a local optimization can be difficult to foresee. That is where performance analysis comes into play. By definition, performance analysis requires comparisons of bandwidth and latency measurements taken from different SoC architectures. That means that two or more (typically several) SoC architectures must be modeled, simulations run, and performance measurements made. To enable this type of flow, an effective performance analysis solution must satisfy the following five requirements.
1) Cycle-accurate model
Performance analysis cannot be performed on an abstract representation of the SoC. While a transaction-level model (TLM) of an SoC is useful to support early software development, a cycle-accurate model is required for meaningful performance analysis. From a practical standpoint, a cycle-accurate model requires using the register-transfer level (RTL) design description.
2) Automatic RTL generation
RTL is always produced as part of the SoC development process. However, it is increasingly common for specific IP to be generated from configuration meta data. The interconnect is an example where most vendors generate the associated RTL, which enables quick iteration while assessing performance behavior.
3) Verification IP
Performance verification requires capturing transactions that cross the SoC interconnect components. Verification IP (VIP) monitors are needed to watch traffic at all the interconnect ports. Also, VIP can be used to generate shaped traffic in place of IP blocks to provide for faster simulation and a higher level of control over the simulation run, especially setting up traffic conflicts that the SoC will experience in the real world.
4) Testbench generation
A performance testbench is needed to drive functional simulation runs of the various SoC configurations. The testbench must be changed for each SoC RTL configuration, so a generator is required to speed up the process and make it practical to perform performance analysis.
5) In-depth analysis
Once a cycle-accurate RTL model and testbench have been generated, tests must be run against the design and thousands of transactions monitored by the VIP. At this point, bandwidth and latency measurements can be made and analyzed. A final key requirement of the performance analysis solution is to have in-depth analysis features. Graphical displays are required to visualize results and sophisticated filtering mechanisms are required to identify root causes.
A performance analysis solution does exist that implements the above requirements. It starts with the Cadence broad line of VIP including the Interconnect Validator product that monitors traffic across advanced interconnects. Cadence also provides the Interconnect Workbench tool that provides both testbench generation and in-depth performance analysis. These products work in parallel with RTL generation tools from interconnect IP providers to provide a complete performance analysis solution.
To learn more about the Cadence® products and how to conduct performance analysis on your SoC, register here to receive a detailed white paper and be notified of an upcoming live online chat session.
© 2013 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence and the Cadence logo are registered trademarks of Cadence Design Systems, Inc. in the United States and other countries. All other trademarks are the property of their respective owners.