There's a bandwidth explosion taking place in mobile communications. The rapid growth of mobile communications has driven the number and diversity of wireless communications standards over the last 15 years. The industry has adopted GSM, W-CDMA, the CDMA2000 family, and the TD-SCDMA family of standards in China as well as other parallel communications standards such as WiMAX.
The wireless players are now converging on one set of standards: LTE (long-term evolution). This unprecedented convergence will eventually lead to "LTE Advanced," a true 4G standard. But first we must complete the transition to LTE, which will provide significantly increased data rates, improved spectral efficiency using OFDM (orthogonal frequency-division multiplexing) and MIMO (multiple-input and multiple-output), scalable bandwidth (1.25 MHz to 20 MHz), support for single (TDD) and dual/paired channels (FDD), and an "all IP network" (AIPN) that will leverage the IP ecosystem by transitioning the higher network layers to IP protocol.
The Global Mobile Suppliers Association (www.gsacom.com) reported in April 2010 that there are commitments for 64 LTE networks in 31 countries. They expect that up to 22 LTE networks will be in service by the end of 2010. And 39 or more LTE networks will be in service by the end of 2011. A total of 88 operators in 42 countries have committed to deploy LTE systems or are engaged in trials or other planning activities.
The semiconductor opportunity is tremendous and challenging. Performance requirements for LTE are 100 to 1000 times as demanding as those for the current 3G networks. User Equipment (UE) devices are very high-volume, low-margin products requiring the most power, area and cost-efficient designs. Therefore the algorithms for these devices are chosen to optimize both the compute and memory requirements. Code size is also an important design factor.
What Does an LTE Baseband Subsystem Look Like?
Figure 1 illustrates a functional block diagram of an LTE baseband subsystem. The top part of Figure 1 shows the core LTE signal processing channel for the receiver. There's an antenna connected to the block marked "Receive RF" and there are likely multiple antennas in a MIMO configuration. The RF receiver feeds a front-end filtering block, which drives the OFDM followed by complex channel estimation and MIMO detection, which combines the signals from the multiple antennas to improve bandwidth and fidelity in the receive channel. Once the signals are combined, a complex demodulation stage is followed by forward error correction, which produces a bit stream that is directed to the LTE MAC layer for higher level protocol processing.
Figure 1. LTE Baseband Block Diagram
In the transmit direction, bit streams from the MAC layer go through convolutional coding, various frequency domain transformations, output shaping, and then on to the transmit RF stage, the power amplifiers, and the antennas.
Two Big LTE Challenges
The LTE specifications present two major challenges. The first challenge resides in the new LTE technologies and algorithms such as OFDM and MIMO, which are complex and require significantly more computation than 3G standards. The second big challenge is the absolute level of computation required. As shown in figure 2, as the industry has moved from GSM through UMTS and HSDPA to LTE, computation requirements have risen four or five orders of magnitude – from around 10 MOPS (millions of operations per second) to 100,000 to 1 million MOPS – to deliver the 100 to 100 Mbits/sec performance expected in LTE.
Figure 2. Cellular Radio Computation Levels (MOPS) versus Download Data Rates (Mbits/sec)
Needed: New Levels of DSP Performance
General-purpose DSPs have become far too big and power hungry for an efficient handset; they usually too expensive and require too much power for basestations. Why? General purpose DSPs are indeed too general – they trade application performance for that generality. This inefficiency means that designers usually have to develop multiple RTL blocks to augment the performance. Since RTL blocks are inherently non-programmable, they can’t support new standards, and are very time consuming to design and verify.
Early attempts at developing software defined radio (SDR) were disappointing because they relied on a single DSP that couldn’t keep up with the demands. Offloading to hard coded RTL blocks defeated the whole purpose of SDR, as they are not programmable.
Yet the entire radio can be defined with processors – specially designed processors, each optimized for the task at hand. These new, specially tuned DSP cores are being developed just to meet the LTE challenge. It is so complex that no single core can adequately perform all of the required functions.
A typical modern communications transceiver (transmitter or receiver) can be partitioned into three different compute domains. The signal domain performs optimizations on the complex or real valued data, including the FFT, filtering, synchronization, and matrix operations that are typically closest to the RF-analog side of the system. The soft bit domain extends from the soft demapper module to the FEC decoding module, closest to the MAC side of the receiver chain. The bit domain usually is in the transmit side closest to the MAC side and includes operations such as CRC encoding, scrambling, FEC encoding and bit interleaving.
The ATLAS Reference Architecture for LTE
Because the LTE challenge is so complex, Tensilica has developed the ATLAS reference architecture, starting with a mapping for an LTE UE CAT-4 system, with one Tx antenna, two Rx antennas and 20 MHz bandwidth. 10 MHz and 5 MHz variants of the ATLAS system are also available. The ATLAS architecture specifically addresses processing from the front-end filters next to RF up to the transport blocks to the MAC. Figure 3 shows the ATLAS UE functional block diagram.
Figure 3. The ATLAS LTE UE Functional Block Diagram
The ATLAS Receive Chain
The receive chain (bottom part of Figure 3) receives data from the front-end filters and generates transport blocks for the MAC. It contains the controller and three processing domains: the signal domain (the Receive Signal Processor, or RxSP), the matrix domain (the Receive Channel Processor, or RxChP), and the soft bit domain (the Receive Hybrid ARQ Processor, or RxHARQ, the Turbo Engine, and the Receive Control Processor, or RxCP). All these processors, including the Turbo Engine, were built with Tensilica's Xtensa customizable processor technology so they could be optimized for the exact task.
Tensilica's Baseband Engine – the ConnX BBE16 – forms the base of the RxSP and RxChP. The ConnX BBE16 is a 128-bit, 3-way, 16-MAC DSP that can perform complex multiplies in a single cycle and a radix-4 FFT in a single cycle. The RxSP generates data resource blocks for every symbol and channel estimates for every symbol. The data resource blocks are immediately written into the input buffer of the next processor in the chain, the RxChP, which performs MIMO decoding to generate soft big values that are provided to the HARQ module.
The RxHARQ processor takes in the decoded soft-bit values, assembles them into appropriate redundancy versions (RVs), and performs HARQ recombining to generate code blocks, which are written into the input buffer of the Turbo Decoder. The Turbo decoder decodes the code blocks, which are then written in to the input buffer of the RxCP.
The RxCP is the master controller, performing sleep control and power management. It decodes the channel headers to configure the receive and transmit chain for proper operations. It also provides a control and data interface to the MAC processor.
The ATLAS Transmit Chain
The transmit chain contains two domains and two processors, the Transmit Bit Processor (TxBP) and the Transmit Signal Processor (TxSP).
The TxBP performs CRC encoding, bit scrambling, turbo encoding, sub-block interleaving, rate matching and Physical Uplink Control Channel encoding. For this process, Tensilica developed a bit stream processor (the ConnX BSP3), a 32-bit DSP that contains special instructions to facilitate LTE-specific bit-level processing for CRC, Turbo encoding and interleaving.
The TxSP takes the encoded bit stream and generates corresponding SC-FDMA symbols, which are then provided to front-end filters for upconversion and mask compliance. The TxSP is implemented using Tensilica’s BBE16 DSP and performs CRC encoding, bit scrambling, turbo encoding, gray encoding RB mapping, layer mapping, DRT, FFT, and carrier prefix attaching.
The Multi-core Approach Works
A multi-core approach to LTE design works because each core is specifically optimized for the task. In this way, designers get maximum efficiency and the performance level required for LTE processing. The ATLAS architecture was specifically developed for modular design. It can easily be scaled to different levels or categories of performance by removing or adding processors of the various types. Because the processors can be further customized, designers can take Tensilica’s design ideas and develop them further to better match their performance, power and cost budgets, or unique algorithm implementations.
One of the big advantages of using smaller, optimized processors is that when their processing capability is not required, the core and its memories can be powered down. This type of partitioning and operation is similar to the “mature” designs of 3G and 3.5G basebands where power consumption was a major focus.
By using customized DSPs for each major step in the LTE system, designers get maximum efficiency. Because all of the processors are based on Tensilica’s Xtensa customizable processor cores, they use a common software system. The compilers, debuggers, ISS, etc. all understand the customizations and will provide the maximum throughput for the software.
The serial connection of optimized DSPs is well suited to the dataflow style of processing for wireless algorithms such as LTE. It allows for a simpler software programming model and easier debug as an algorithm runs on just one core. The datapath connection is not over a globally shared bus, but dedicated point-to-point connections that allow fast, deterministic loading of data into other DSP memories, without the need for bus arbitration. Also, the performance does not degrade as more processors are added onto the bus, as with typical bus-based systems.
The Software Stacks for LTE
The software for LTE is quite complex, and there are a number of specialized software suppliers who really understand the requirements. Tensilica has been working with mimoOn, who is well respected for their software expertise, to develop an LTE PHY software stack optimized to get the maximum software performance out of the specialized Tensilica DSPs in the ATLAS architecture.
The Complete LTE Layer 1 PHY
The ConnX ATLAS LTE reference architecture implements the complete LTE layer 1 PHY, including the computationally demanding Turbo decoder, in a completely software-programmable processor-based DSP core reference architecture. It is intended as a starting point for design teams implementing LTE baseband systems. A design team will integrate the ATLAS components together with the Layer 2 design elements and system interconnect elements of the design team’s choosing.
Because it is a modular design, the design team may decide to deploy all seven modules or they may decide to re-use pre-existing RTL blocks in lieu of one or more of the ATLAS components.
As designers master LTE and move on to LTE Advanced, these same modular architectural components in the ATLAS LTE reference architecture will be ready to help jump start the design efforts.
Abhijit Shah is Chief Architect, Baseband at Tensilica, Inc., where he is involved in defining new wireless initiatives, specifying specialized cores, kernels and library packages.