Hence, using the additional features available in the PS can significantly improve the performance over soft processors such as Microblaze or external DSP processors.
In order to improve the DPD performance yet further it may be desirable to move these functions to hardware using the PL. However, as the software is written in C/C++, it may take time to convert the C/C++ into hardware capable of running in the Zynq PL using VHDL or Verilog.
This has now been resolved with the availability of high level synthesis (HLS) tools (i.e., C-to-RTL tools). These tools allow programmers experienced in C/C++ programming the ability to target hardware in the form of FPGAs. The Vivado HLS tool allows designers and system architects to easily map C/C++ code to programmable logic, allowing code re-use, maximum portability and an easy mechanism for design space exploration, thereby allowing maximum productivity.
Figure 4. Vivado High Level Synthesis (HLS) design flow
Figure 4 details the typical C/C++ design flow when targeting the HLS tool. The tool’s output is RTL, which enables easy integration with existing hardware design, such as the datapath pre-distorter or upstream processing, and of course the interfacing to the data converters.
Using this tool, the algorithm can be quickly moved to hardware, where it must interface with the PS using AXI interfaces, as shown in Figure 5.
Figure 5. Integrating the programmable logic-based AMC hardware accelerator algorithm with the processing system
Running the AMC algorithm at high clock rates in the PL has a significant effect on performance that is shown in Figure 6, resulting in a 70x increase in performance for that function alone over implementing the same function in software, while consuming less than 3% of the logic available in the SoC device.
Figure 6. Demonstrates performance improvement for software only and software + hardware acceleration
From the original reference C/C++ code basic optimisations were implemented to run more effectively on the Cortex-A9
processor, resulting in an initial 2-3x improvement in performance for the software only implementation versus untouched code. At this point the NEON media co-processor was enabled, where an additional performance benefit is obtained. The final result in Figure 6 is achieved with the AMC algorithm running in the programmable logic as displayed in Figure 5, where an overall improvement of 70x is achieved for the AMC function alone versus the initial software.
Ultimately the radio performance defines the required DPD partitioning between hardware and software. One factor that may affect the performance could be the pursuit of greater levels of spectral correction to enable greater efficiency. Achieving that correction would require more processing power as the fidelity used to represent the amplifier non-linearity is increased. Other factors could be wider transmission bandwidth or sharing the Estimation Engine between multiple antennas. This would allow area (and cost) savings in that only one processor, plus optional hardware accelerators are used to calculate the coefficients for many datapath pre-distorters.
In some situations the performance of the software running on the Cortex-A9
+ NEON unit may be adequate, such as narrower transmission bandwidth configurations or designs that have just 1 or 2 antenna paths to process data for, reducing area and cost for those radio configurations.
In order to improve performance beyond what is demonstrated in Figure 6; additional parallelism can be added to the implementation of the AMC function resulting in faster update times at the expense of an increasing logic based implementation. Further software profiling may also indicate other areas of the algorithm that would benefit from hardware acceleration. Whatever the requirement, the tools and silicon now exist to enable designers to explore performance, area and power trade-offs with minimal effort in pursuit of higher efficiency, without being constrained to specific discrete devices or programming styles.
Radio infrastructure demands low cost, low power and high reliability. Integration is the key to achieving these goals, but until now it was not possible without sacrificing flexibility or time to market. In addition, processing requirements continue to increase with broadband radios and the pursuit of higher efficiency. With its dual core processor subsystem, high performance and low power programmable logic, the Zynq-7000 All Programmable SoC is the solution to current and future radio requirements.
Whether the equipment is a remote radio or active antenna array, designers can create products with greater productivity while increasing flexibility and performance over existing solutions, be they ASSP or ASIC. No longer are the boundaries between software and hardware fixed, opening up infinite possibilities to designers seeking more advanced algorithms for product differentiation.
About the Author
David Hawke is the Product Marketing Manager for Radio in the Communications Business Unit at Xilinx. He has more than 18 years of experience in the semiconductor industry in roles ranging from Design, Application Engineering, Sales and Marketing. Prior to transferring to marketing from Sales in 2005 he was Senior Staff FAE covering the Wireless market sector in the UK for 7 years. During his career, David has been published in a number of trade journals and spoken at a number of conferences and industry events such as LTE World Summit, Wireless China, Next Generation Networks Conference and IWPC events. David began his career in FPGA design at Rutherford Appleton Laboratory, UK. David holds a B.Eng in Electronic Engineering from De Montfort University, Leicester, UK.
Xilinx CTO Office
Xilinx DPD Design Team