The final PCI Express® (PCIe) 3.0 base specification was released in November 2010, providing a doubling of throughput over the PCI Express 2.0 specification. The development of the 3.0 specification by the working groups took a few years, but now that it is ratified, developers are rushing to incorporate the updated interface into their products to meet the performance demands of their customers. Data center products such as blades, networking, storage and servers all benefit from the speed increases provided by the chips incorporating to the PCI Express 3.0 interface. The obvious question asked by companies developing these products is: “How long will it take for the market to adopt PCI Express 3.0?” Synopsys has been developing IP for PCI Express since its inception in 2003 and has seen a steady increase in the number of design starts incorporating the PCI Express 3.0 interface as shown in figure 1 below. In 2010 there was three times the number of PCI Express 3.0 design starts than in 2009. The majority of these designs started in the fourth quarter of 2010, which coincides with the final release of the 3.0 specification. This trend has continued in the first quarter of 2011. Companies are all targeting product availability in the second half of 2011 and the long gestation of the final 3.0 specification has provided added pressure on them to meet their market windows.
Figure 1: Number of design starts from 2003-2010
Consistent with the second generation transition to PCI Express 2.0, the third generation transition to PCI Express 3.0 is doubling the performance over the prior generation for the total bandwidth. A comparison of the speeds for the three generations of PCI Express is shown in table 1 below.
Table 1: Comparison of Total Available Bandwidth for PCI Express
Let’s look deeper into the changes of the PCI Express 3.0 specification to get an understanding of the journey designers will take to create these products for the enterprise market. Instead of doubling the signaling rate from 5.0 GT/s (gigatransfers per second) to 10 GT/s, PCI Express 3.0 uses 8.0 GT/s signaling rate. Moving to 10 GT/s presented a number of challenges to the design community such as:
Possibility of changes in requirements to existing PCB materials and connectors
Possible reduction in the supported channel length
Forced usage of Decision Feedback Equalization (DFE) over the existing Continuous Time Linear Equalization (CTLE) in the PHY, which is much harder to implement
To cope with these challenges, the decision to use a signaling rate of 8.0 GT/s was made early in the specification development process. However, an 8 GT/s signaling rate still required the PHY to incorporate DFE equalization in order to meet the required channel length, a decision that took awhile to gain acceptance. In the end, PCI Express 3.0 does achieve the goal of doubling the bandwidth, but using 8.0 GT/s only provides about 60% of the improvement. The remaining 40% improvement is the result of removing protocol overhead. Half of the improvement (20%) came with the removal of the 8b/10b encoding used in PCI Express 1.1 and 2.0 and replacing it with a 128/130b scrambling scheme. Removal of the 8b/10b encoding also necessitated the removal of K-codes, which are special symbols defined by the protocol for link management. The K-codes are used for Transaction Layer Packets (TLP) and Data Link Layer Packets (DLLP) packet boundaries and to signal nullified TLP packets. For example, the COM symbol, one of the K-codes, is used to reset the scrambler as well as provide symbol alignment and de-skewing between multiple lanes. Removal of the K-codes necessitated the reinvention and optimization of how the control information was conveyed. As a result, they were replaced with packet length counters with the values embodied in new entities called “tokens”. The removal of the K-codes and optimization of the protocol provided the last 20% needed for PCI Express 3.0 to double the bandwidth of PCI Express 2.0. The end result for PCI Express 3.0 is that it is a very different protocol at the physical layer than PCI Express 1.1 and 2.0, yet both protocol definitions must be supported in the same interface, making it more challenging for the designer.
Supporting all generations of the protocol in a single interface is a challenge, but additional challenges exist when high-performance PCI Express 3.0 designs require 8 or 16 lanes. A design for a PCI Express 3.0 interface with 16 lanes that targets a clock frequency of 500 MHz instead of the 1 GHz “natural frequency” forces the designer to deal with processing more than one symbol per clock cycle, per lane. Exploring this in more detail, the natural frequency for a PCI Express controller with 16 lanes is the clock frequency when using the 8-bit (symbol) PIPE interface. This is the symbol rate on the line for PCI Express 3.0 or for PCI Express 1.1/2.0 when 8b/10b encoding overhead is removed. Table 2 calculates the natural frequencies for the controllers:
Table 2: The Natural Frequencies for PCI Express Controllers
Using today’s technology, it is easy to run 16 lane PCI Express 1.1 or 2.0 controllers at the natural frequency, which processes 16 symbols per clock cycle. Sixteen symbols is less than the minimum packet size, so there is no issue. For a 16 lane PCI Express 3.0 controller, it is very difficult to run at the 1 GHz natural frequency due to logic complexity and technology limitations, so the design will need to process a multiple of sixteen symbols per clock cycle. Dealing with multiples of 16 symbols in the same clock period pushes beyond the minimum packet size as defined by the PCI Express specification. For example, if the design runs at half the natural symbol frequency (500 MHz), then the design needs to process two sets of 16 symbols per core clock (32), which means that the controller has to be 256-bits (32 symbols * 8 bits/symbol) wide. The minimum packet size as defined by the PCI Express specification is twenty bytes in length, which leads to a special situation. In this case, two Start of TLP (STP) tokens have to be processed in the same core clock cycle. Figure 2 shows an example of a 16-bit PIPE interface, so two symbols have been merged into one core clock cycle for each lane. On the right-hand side is the core clock and on the left-hand side is the symbol clock. It is clear that two STP tokens have to be processed in a single clock cycle, which presents special implementation challenges.
Figure2: PCIe 3.0 with 16-bit PIPE having two STP tokens per core clock with minimum packets
This could be solved by running two separate data paths through the core, duplicating all of the logic and then providing two packets at the same time to the application. This forces the designer to deal with issues in addressing and maintaining the proper ordering rules in the application logic. Another choice is to serialize the packets in a buffer before passing the packet to the upper layers. However, using a buffer to serialize the packets before the flow control buffers does not solve a possible overflow condition when 100% utilization is a target. If the buffer isn’t big enough, the design cannot just drop packets. There are other ways to solve this issue, but the designer needs to understand the traffic flow in the system to determine the right approach for the design.
Summary
PCI Express 3.0 is being adopted by many companies to benefit from the increase in bandwidth, but implementing PCI Express 3.0 requires the designer to understand the new protocol, DFE interaction at the PIPE interface and special situations that exists for implementing this high-speed interface. Because of the significant scope of changes that are required when developing the PCI Express 3.0 interface, it is important to carefully manage the size and complexity of the implementation to fully realize the bandwidth improvements of the PCI Express 3.0 interface.
DesignWare IP Solution for PCI Express
The Synopsys DesignWare® IP for PCI Express solution provides the port logic necessary to implement and verify high-performance designs using the PCIe interconnect standard. The complete, integrated solution is silicon-proven and includes a comprehensive suite of configurable digital controllers, high-speed mixed-signal PHY, and verification IP, all of which are compliant with the PCIe 3.0, 2.1 and the 1.1 specifications. The DesignWare PCI Express solutions support the Single Root I/O Virtualization Technology (SR-IOV) specification from the PCI Special Interest Group (PCI-SIG). By providing a complete solution from a single IP vendor, Synopsys reduces integration risk by helping to ensure that all the IP functions seamlessly together. Synopsys DesignWare IP for PCI Express provides designers with a high performance IP solution that is extremely low in power consumption, area and latency.
As the market leader for PCI Express IP for the last four years (Gartner 2010), Synopsys continually delivers next-generation and innovative PCIe IP solutions to the market. With a strong focus on delivering high quality, the DesignWare IP for PCI Express has undergone extensive third party interoperability testing, with products shipping in volume production. Using strict quality measures and backed by an expert technical support team, Synopsys enables designers to accelerate time-to-market and reduce integration risk for next generation, PCIe-enabled desktop, mobile, consumer and communication system-on-chips.
Scott Knowlton joined Synopsys in 1997 and is the Sr. Product Marketing Manager for the DesignWare PCI Express, PCI-X, PCI and SATA IP product families. Scott was previously responsible for the DesignWare AMBA and coreTools product lines. Prior to joining Synopsys, Scott worked in simulation, synthesis and mixed signal solutions at Cadence Design Systems after several en¬gineering and project management positions in ASIC development at Encore Computer, Intrinsix, and Raytheon. Scott earned his Bachelor of Science degree in Electrical Engineering from the University of Michigan