Cellular network operators require significant equipment cost reduction as they strive to increase the network capacity through the use of new air interfaces, new transmission frequencies, wider bandwidth, increasing antenna counts and a greater number of cell sites. Furthermore, these operators require increased equipment efficiency and greater network integration to reduce operating costs. To provide equipment that meets these disparate needs, manufacturers of wireless infrastructure equipment seek solutions that provide greater levels of integration with higher performance and increased flexibility, while delivering lower power and cost. In addition, the equipment providers must do this while shortening time to market.
The key to reducing the overall equipment cost is integration, but it is down to the advanced digital algorithms that improve power amplifier efficiency to reduce operating costs. One such algorithm that is commonly used is digital pre-distortion (DPD). It’s a challenge to improve the equipment efficiency while the equipment configurations get ever more complex. Radio transmission bandwidths are approaching 100MHz with LTE-Advanced, and even exceeding that as vendors attempt to combine multiple air interfaces in a non-contiguous spectral configuration. Active antenna arrays (AAA) and multiple-input and multiple-output (MIMO) enabled remote radio units (RRUs) continue to put further pressure on the compute bandwidth required of these algorithms. In this article we’ll investigate how the Zynq-7000 All Programmable SoC can be used to increase the performance of current and future DPD systems while offering equipment vendors full programmability, low cost and power along with fastest time to market.
Implementing Cellular Radio on an All Programmable SoC
This SoC marries high performance programmable logic (PL) fabric containing serial transceivers (SERDES) and DSP blocks that are tightly integrated with a hardened processing subsystem (PS). The PS contains an ARM dual-core Cortex-A9 MPCore
, floating point units (FPUs) and NEON Media Accelerator coupled with a rich set of peripherals such as UARTs, SPI, I2C, Ethernet and memory controllers necessary for complete radio operation and control. Unlike an external general purpose or DSP processor, the interface between the PL and PS allows very high bandwidth due to a large number of connections that would be impractical on a discrete solution. With such an array of hardware and software this device is capable of implementing all the required functions of RRUs in a single chip, as shown in Figure 1.
Figure 1. A typical radio architecture, where all digital functions can be combined into a single device
The abundant DSP resources in the PL are used to implement the digital signal processing such as digital up conversion (DUC), digital down conversion (DDC), crest factor reduction (CFR) and DPD. In addition, the SERDES are capable of supporting 9.8Gbps CPRI and 12.5Gbps JESD204B needed to interface to baseband and data converters respectively. The PS supports both symmetric multi-processing (SMP) and asymmetric multi-processing (AMP). In this case it is assumed that the AMP mode is used where one of the Cortex-A9
processors implement the board level control functions, such as message termination, scheduling, calibration and alarms running either bare metal or, more likely, an operating system such as Linux. While the other is used to implement parts of the desired DPD algorithm, as not all parts of this algorithm warrant a full hardware solution.
DPD improves power amplifier efficiency by extending its’ linear range. Efficiency is improved when the amplifier is driven harder to improve the output power, while static power remains relatively constant. In order to extend this linear range, DPD uses an analogue feedback path from the amplifier and a significant amount of signal processing to calculate coefficients that are used to represent the inverse of the amplifier’s non-linearity. These coefficients are then used to pre-correct the transmitted signal driving the power amplifier, resulting in the increase of the amplifier’s linear range.
The DPD algorithm can be broken down into multiple functions as shown in Figure 2.
Figure 2. Digital Pre-Distortion broken into functional partitions
DPD is a closed loop system where the previously transmitted signal is captured to determine how the amplifier behaved with the transmitted signal. The first task of DPD is to align the output of the amplifier with the previously transmitted signal, which takes place in an alignment block. Memory is used to align the data before any further algorithmic work takes place. Once aligned, this data can be manipulated to create the coefficients that represent a close approximation to the inverse non-linearity of the PA. The autocorrelation matrix computation (AMC) and coefficient computation (CC) algorithms are used for this. Once coefficients are known the datapath pre-distorter then uses the data to pre-correct the signal being transmitted to the PA.
Accelerating DPD Coefficient Estimation
These functions can of course be implemented in many different ways. Some may be suited to software and others to hardware. There may also be functions that suit either software or hardware, but it is the required performance that ultimately dictates the implementation. Having a SoC device allows designers free reign to carefully balance between hardware and software. As far as the DPD algorithm is concerned, the datapath pre-distorter that consists of high speed filtering is typically implemented in the PL since it requires very high sample rates. While the Alignment and Estimation Engine that generates the DPD coefficients can be run on a Cortex-A9 in the PS.
In order to determine what requires implementation in hardware versus software, the software must first be profiled to determine where it spends time. Figure 3 illustrates the software profile of the DPD algorithm for the three identified functions in Figure 2. As the profiling demonstrates, in the Xilinx DPD algorithm 97% of the time is spent in the AMC processing, so this function is the one that would make the most sense in accelerating first.
Figure 3. Software profiling of identified software tasks as part of DPD processing
performs some additional features in its armoury that also helps improve performance for this type of application. For example, as part of the PS, each processor has a FPU and a NEON Media Accelerator. The NEON unit is a 128bit single instruction multiple data (SIMD) vector co-processor, allowing two 32x32b multiplications simultaneously, which ideally suits the needs of the AMC function as it’s dominated by multiply accumulate (MAC) operations. Making use of the NEON module, software intrinsics can be used that eliminate the need for low level programming in assembly.