Design Article
Comment
timemerchant
Xilinx WP369 was dated April 27, 2010 describing the ARM and Xilinx FPGA, but we ...
Dr DSP
The processor centric nature of these devices is a nice improvement. It will be ...
Creating the Xilinx Zynq-7000 Extensible Processing Platform
Larry Getman, Xilinx
10/17/2011 5:13 PM EDT
In March of 2011, Xilinx officially announced the first four devices of its new 28nm Zynq-7000 Extensible Processing Platform (EPP) family. Each of these devices merges an ARM dual-core Cortex-A9 MPCore processing system with a NEON media engine and a double-precision floating-point unit on the same IC, along with Level 1 and Level 2 caches, memory controllers, large programmable-logic blocks and a slew of commonly used peripherals. Creating a device that offers design teams the broadest range of programmable options has been a challenging process, but also a highly rewarding one.
Microprocessors aren’t new to the FPGA world. In fact, FPGA vendors have been offering various types of processors for their FPGAs ever since FPGA transistor counts grew big enough to accommodate them. In the late 1990s FPGA vendors started offering soft cores (8-bit, 16-bit and then 32-bit processor cores in Verilog or VHDL, or as prerouted netlists) that hardware designers could program into FPGAs with synthesis and place-and-route tools. Then, in the early 2000s, Moore’s Law made enough transistor real estate available to allow FPGA vendors to implement microprocessors in the silicon itself, next to programmable-logic blocks. Implementing cores in the fabric itself saves space on the chip for programmable logic, speeds processing and overall performance of the chip, and lowers power. Xilinx used this method when it implemented PowerPC processors in derivatives of its Virtex-4 and Virtex-5 devices. Those FPGAs continue to be very successful in some markets, but the company felt the need to take the concept to the next level, with a fully encapsulated processing system that appeals to a wider audience than just hardware designers.
These PowerPC-based FPGAs formed the groundwork for the new technology that Xilinx believes will prove revolutionary in the electronics industry—a device that will make such an impact that it will create an entirely new class of semiconductor product. We call it the Extensible Processing Platform (EPP).
Which processor to use?
Over almost three decades, ARM has built a formidable and unmatched hardware and software development infrastructure and has a growing user base in an increasing number of markets, mirroring in many ways the growth of Xilinx. Traditionally, ARM customers have implemented ARM cores in ASICs and ASSPs, but as the manufacturing processes have become more expensive and complex, more and more companies are building their end products around FPGAs instead.
Many companies that develop ASICs have in fact used ARM processors in their ASIC designs and have created their own IP to work with ARM’s AMBA bus and the AMBA AXI point-to-point interface. Noting this fact, Xilinx worked with ARM to select a processing system that would suit the needs of the broadest number of FPGA customers. It also worked with ARM on a new revision of AXI, called AXI4, that adds a data-streaming capability into the interface. Many FPGA applications require streaming data, as FPGAs excel in parallel and serial data traffic management.
What’s more, Xilinx ensured that all its latest FPGAs, not just the Zynq-7000 EPP, support the AXI4 interface. This allows Xilinx, its IP partners and its customers to develop IP that will work across all Xilinx devices. This feat is facilitated by the fact that Xilinx 7 series devices – namely, the Artix-7, Kintex-7 and Virtex-7 FPGA families – as well as the Zynq-7000 all use the same programmable-logic architecture. AXI4 also enables thousands of multigigabit data transfers between the ARM dual-core Cortex-A9 MPCore processing subsystem and the programmable logic at very low power, thereby eliminating common performance bottlenecks for control, data, I/O and memory that would plague a system that put a standalone ASIC with an ARM processor next to a standalone FPGA, both implemented on a PCB.
Programming model: Processor-first boot-up
FPGAs have traditionally been used by folks who have hardware design backgrounds and are very familiar with hardware description languages such as Verilog and VHDL. Over the last decade, however, a number of algorithm developers and DSP programmers found they could implement much more complicated algorithms on a single FPGA and have the FPGA handle the tasks of many DSPs. In addition, a growing number embedded-systems developers have taken it upon themselves to learn hardware design techniques so they too can leverage FPGAs in their systems.
Still, there are an enormous number of embedded-software designers who could greatly benefit from a device that combines programmable logic and an ARM processor on the same IC. However, previous-generation devices required users to program the FPGA logic before they could get the on-board PowerPC processor to work with the rest of the design.
Therefore, a critical decision in creating the Zynq-7000 architecture was to ensure that the processor runs the show. That is, when users power-on the device, the processor boots first and waits for commands to program the programmable-logic portion. In fact, if users so desired, they could run the device as a standalone processor and never touch the programmable-logic portion of the chip. But the big value-add of the Zynq-7000 is that users can, at their discretion, offload processing functions to the FPGA fabric. Doing so helps them create systems that run optimally in terms of functionality, performance and power.
Tool flows
While Xilinx believes the Zynq-7000 Extensible Processing Platform will be a blockbuster device that the traditional user base, hardware designers, will rapidly adopt, the company also expects the device will see increased use by embedded-software designers. For this reason, Xilinx put extra effort into fitting the device’s programming into flows familiar to both camps.
Software application engineers can use the same development tools they have employed for previous designs. Xilinx provides the Software Development Kit (SDK), an Eclipse-based tool suite, for embedded-software application projects. Engineers can also use other third-party development environments, such as the ARM Development Studio 5 (DS-5), ARM RealView Development Suite (RVDS) or any other ARM development tools.
Linux application developers can use both of the Cortex-A9 CPU cores in Zynq-7000 devices in a symmetric-multiprocessor mode to optimize the performance of their designs. Alternatively, they can also set up the CPU cores in an asymmetric-multiprocessor mode running Linux or other real-time operating systems (RTOSes).
To jump-start software development, Xilinx created several open-source Linux drivers as well as bare-metal drivers for all the processing peripherals, such as USB, Ethernet, SDIO, UART, CAN, SPI, I2C and GPIO. The company and its partners are also busy developing OS/RTOS board support packages with middleware and application software.
Xilinx is currently offering Zynq-7000 prototyping systems to select customers to help them with early system development. Lab silicon of the Zynq-7000 is due during the second half of 2011, with general engineering samples scheduled for the first half of 2012. Based on forward volume-production pricing, the Zynq-7000 family will have an entry point below $15 in high volumes. Interested customers should contact their local Xilinx representative. For more information, please visit www.xilinx.com/zynq.
About the author
Lawrence Getman is the VP of Processing Platforms at Xilinx. Prior to this role, hewas in charge of Corporate Development at Xilinx.
Before joining Xilinx, Lawrence worked as the VP of Business Development at Triscend Corporation and also held a variety of marketing and sales roles. He holds a BSEE from Rochester Institute of Technology as well as an MBA from San Jose State.
Sidebar: The four devices in the Zynq-7000 stable
Each of the Zynq-7000 family’s four devices has the exact same ARM processing system, but the programmable-logic resources vary for scalability and fit different applications.
The Cortex-A9 Multi-Processor core (MPCore) consists of two CPUs – each a Cortex A9 processor with dedicated NEON coprocessor (a media- and signal-processing architecture that adds instructions targeted at audio, video, 3-D graphics, image and speech processing), and a double-precision floating-point unit. The Cortex-A9 processor is a high-performance, low-power, ARM macrocell with a Level 1 cache subsystem that provides full virtual-memory capabilities. The processor implements the ARMv7architecture and runs 32-bit ARM instructions, 16-bit and 32-bit Thumb instructions, and 8-bit Java byte codes in Jazelle state. In addition, the processing system includes a snoop control unit, a Level 2 cache controller, on-chip SRAM, timers and counters, DMA, system control registers, device configuration and an ARM CoreSight system. For debug, it contains an embedded trace buffer (ETB), instrumentation trace macrocell (ITM) and cross-trigger module (CTI) from ARM, along with AXI monitor (AXIM) and fabric trace (FTM) modules from Xilinx.
The two larger devices, the Zynq-7030 and Zynq-7040, include high-speed, low-power serial connectivity with built-in multigigabit transceivers operating at up to 12.5 Gbits/second. These devices offer approximately 1.9 million and 3.5 million equivalent ASIC gates (125K and 235K logic cells) respectively, along with DSP resources that deliver 480 GMACs and 912 GMACs respectively of peak performance. The two smaller devices, the Zynq-7010 and Zynq-7020, provide roughly 430,000 and 1.3 million ASIC-gate equivalents (30K and 85K logic cells) respectively, with 58 GMACs and 158 GMACs of peak DSP performance.
Each device contains a general-purpose analog-to-digital converter (XADC) interface, which features two 12-bit, 1-Msample/s ADCs, on-chip sensors and external analog input channels. The XADC offers enhanced functionality over the system monitor found in previous generations of Virtex FPGAs. The two ADCs, which can sample up to 17 external-input analog channels, support a diverse range of applications that need to process analog signals with bandwidths of less than 500 kHz.
If you found this article to be of interest, visit Programmable Logic Designline where you will find the latest and greatest design, technology, product, and news articles with regard to programmable logic devices of every flavor and size (FPGAs, CPLDs, CSSPs, PSoCs...).
Also, you can obtain a highlights update delivered directly to your inbox by signing up for my weekly newsletter – just Click Here to request this newsletter using the Manage Newsletters tab (if you aren't already a member you'll be asked to register, but it's free and painless so don't let that stop you [grin]).
Microprocessors aren’t new to the FPGA world. In fact, FPGA vendors have been offering various types of processors for their FPGAs ever since FPGA transistor counts grew big enough to accommodate them. In the late 1990s FPGA vendors started offering soft cores (8-bit, 16-bit and then 32-bit processor cores in Verilog or VHDL, or as prerouted netlists) that hardware designers could program into FPGAs with synthesis and place-and-route tools. Then, in the early 2000s, Moore’s Law made enough transistor real estate available to allow FPGA vendors to implement microprocessors in the silicon itself, next to programmable-logic blocks. Implementing cores in the fabric itself saves space on the chip for programmable logic, speeds processing and overall performance of the chip, and lowers power. Xilinx used this method when it implemented PowerPC processors in derivatives of its Virtex-4 and Virtex-5 devices. Those FPGAs continue to be very successful in some markets, but the company felt the need to take the concept to the next level, with a fully encapsulated processing system that appeals to a wider audience than just hardware designers.
These PowerPC-based FPGAs formed the groundwork for the new technology that Xilinx believes will prove revolutionary in the electronics industry—a device that will make such an impact that it will create an entirely new class of semiconductor product. We call it the Extensible Processing Platform (EPP).
Which processor to use?
Over almost three decades, ARM has built a formidable and unmatched hardware and software development infrastructure and has a growing user base in an increasing number of markets, mirroring in many ways the growth of Xilinx. Traditionally, ARM customers have implemented ARM cores in ASICs and ASSPs, but as the manufacturing processes have become more expensive and complex, more and more companies are building their end products around FPGAs instead.
Many companies that develop ASICs have in fact used ARM processors in their ASIC designs and have created their own IP to work with ARM’s AMBA bus and the AMBA AXI point-to-point interface. Noting this fact, Xilinx worked with ARM to select a processing system that would suit the needs of the broadest number of FPGA customers. It also worked with ARM on a new revision of AXI, called AXI4, that adds a data-streaming capability into the interface. Many FPGA applications require streaming data, as FPGAs excel in parallel and serial data traffic management.
What’s more, Xilinx ensured that all its latest FPGAs, not just the Zynq-7000 EPP, support the AXI4 interface. This allows Xilinx, its IP partners and its customers to develop IP that will work across all Xilinx devices. This feat is facilitated by the fact that Xilinx 7 series devices – namely, the Artix-7, Kintex-7 and Virtex-7 FPGA families – as well as the Zynq-7000 all use the same programmable-logic architecture. AXI4 also enables thousands of multigigabit data transfers between the ARM dual-core Cortex-A9 MPCore processing subsystem and the programmable logic at very low power, thereby eliminating common performance bottlenecks for control, data, I/O and memory that would plague a system that put a standalone ASIC with an ARM processor next to a standalone FPGA, both implemented on a PCB.
Programming model: Processor-first boot-up
FPGAs have traditionally been used by folks who have hardware design backgrounds and are very familiar with hardware description languages such as Verilog and VHDL. Over the last decade, however, a number of algorithm developers and DSP programmers found they could implement much more complicated algorithms on a single FPGA and have the FPGA handle the tasks of many DSPs. In addition, a growing number embedded-systems developers have taken it upon themselves to learn hardware design techniques so they too can leverage FPGAs in their systems.
Still, there are an enormous number of embedded-software designers who could greatly benefit from a device that combines programmable logic and an ARM processor on the same IC. However, previous-generation devices required users to program the FPGA logic before they could get the on-board PowerPC processor to work with the rest of the design.
Therefore, a critical decision in creating the Zynq-7000 architecture was to ensure that the processor runs the show. That is, when users power-on the device, the processor boots first and waits for commands to program the programmable-logic portion. In fact, if users so desired, they could run the device as a standalone processor and never touch the programmable-logic portion of the chip. But the big value-add of the Zynq-7000 is that users can, at their discretion, offload processing functions to the FPGA fabric. Doing so helps them create systems that run optimally in terms of functionality, performance and power.
Figure 1. Unlike previous chips that combine MPUs in an FPGA fabric, Xilinx’s new Zynq-7000 Extensible Processing Platform device family lets the ARM processor, rather than the programmable logic, run the show.
Tool flows
While Xilinx believes the Zynq-7000 Extensible Processing Platform will be a blockbuster device that the traditional user base, hardware designers, will rapidly adopt, the company also expects the device will see increased use by embedded-software designers. For this reason, Xilinx put extra effort into fitting the device’s programming into flows familiar to both camps.
Figure 2. The Zynq-7000 Extensible Processing Platform relies on a familiar tool flow for system architects, software developers as well as hardware designers.
Software application engineers can use the same development tools they have employed for previous designs. Xilinx provides the Software Development Kit (SDK), an Eclipse-based tool suite, for embedded-software application projects. Engineers can also use other third-party development environments, such as the ARM Development Studio 5 (DS-5), ARM RealView Development Suite (RVDS) or any other ARM development tools.
Linux application developers can use both of the Cortex-A9 CPU cores in Zynq-7000 devices in a symmetric-multiprocessor mode to optimize the performance of their designs. Alternatively, they can also set up the CPU cores in an asymmetric-multiprocessor mode running Linux or other real-time operating systems (RTOSes).
To jump-start software development, Xilinx created several open-source Linux drivers as well as bare-metal drivers for all the processing peripherals, such as USB, Ethernet, SDIO, UART, CAN, SPI, I2C and GPIO. The company and its partners are also busy developing OS/RTOS board support packages with middleware and application software.
Xilinx is currently offering Zynq-7000 prototyping systems to select customers to help them with early system development. Lab silicon of the Zynq-7000 is due during the second half of 2011, with general engineering samples scheduled for the first half of 2012. Based on forward volume-production pricing, the Zynq-7000 family will have an entry point below $15 in high volumes. Interested customers should contact their local Xilinx representative. For more information, please visit www.xilinx.com/zynq.
About the author
Lawrence Getman is the VP of Processing Platforms at Xilinx. Prior to this role, hewas in charge of Corporate Development at Xilinx.Before joining Xilinx, Lawrence worked as the VP of Business Development at Triscend Corporation and also held a variety of marketing and sales roles. He holds a BSEE from Rochester Institute of Technology as well as an MBA from San Jose State.
Sidebar: The four devices in the Zynq-7000 stable
Each of the Zynq-7000 family’s four devices has the exact same ARM processing system, but the programmable-logic resources vary for scalability and fit different applications.
Figure 3. The Zynq-7000 Extensible Processing Platform debuts with a family of four devices that sport the same ARM processing system but vary in programmable-logic resources. Gate counts range from 3.5 million equivalent ASIC gates (235K logic cells) to 430,000 ASIC gates (30K logic cells). All of the devices come with significant DSP resources and the two larger ones include multigigabit transceivers operating at up to 12.5 Gbps.
The Cortex-A9 Multi-Processor core (MPCore) consists of two CPUs – each a Cortex A9 processor with dedicated NEON coprocessor (a media- and signal-processing architecture that adds instructions targeted at audio, video, 3-D graphics, image and speech processing), and a double-precision floating-point unit. The Cortex-A9 processor is a high-performance, low-power, ARM macrocell with a Level 1 cache subsystem that provides full virtual-memory capabilities. The processor implements the ARMv7architecture and runs 32-bit ARM instructions, 16-bit and 32-bit Thumb instructions, and 8-bit Java byte codes in Jazelle state. In addition, the processing system includes a snoop control unit, a Level 2 cache controller, on-chip SRAM, timers and counters, DMA, system control registers, device configuration and an ARM CoreSight system. For debug, it contains an embedded trace buffer (ETB), instrumentation trace macrocell (ITM) and cross-trigger module (CTI) from ARM, along with AXI monitor (AXIM) and fabric trace (FTM) modules from Xilinx.
The two larger devices, the Zynq-7030 and Zynq-7040, include high-speed, low-power serial connectivity with built-in multigigabit transceivers operating at up to 12.5 Gbits/second. These devices offer approximately 1.9 million and 3.5 million equivalent ASIC gates (125K and 235K logic cells) respectively, along with DSP resources that deliver 480 GMACs and 912 GMACs respectively of peak performance. The two smaller devices, the Zynq-7010 and Zynq-7020, provide roughly 430,000 and 1.3 million ASIC-gate equivalents (30K and 85K logic cells) respectively, with 58 GMACs and 158 GMACs of peak DSP performance.
Each device contains a general-purpose analog-to-digital converter (XADC) interface, which features two 12-bit, 1-Msample/s ADCs, on-chip sensors and external analog input channels. The XADC offers enhanced functionality over the system monitor found in previous generations of Virtex FPGAs. The two ADCs, which can sample up to 17 external-input analog channels, support a diverse range of applications that need to process analog signals with bandwidths of less than 500 kHz.
If you found this article to be of interest, visit Programmable Logic Designline where you will find the latest and greatest design, technology, product, and news articles with regard to programmable logic devices of every flavor and size (FPGAs, CPLDs, CSSPs, PSoCs...).
Also, you can obtain a highlights update delivered directly to your inbox by signing up for my weekly newsletter – just Click Here to request this newsletter using the Manage Newsletters tab (if you aren't already a member you'll be asked to register, but it's free and painless so don't let that stop you [grin]).
Navigate to related information


Dr DSP
10/18/2011 12:23 PM EDT
The processor centric nature of these devices is a nice improvement. It will be interesting to see what applications will benefit from the combined closely coupled FPGA and processor vs the traditional separate Processor and FPGA. Historically the added cost for a single device implementation has been the big barrier. $15 can buy a lot of processor and FPGA as separate devices...
Sign in to Reply
timemerchant
10/23/2011 1:40 AM EDT
Xilinx WP369 was dated April 27, 2010 describing the ARM and Xilinx FPGA, but we are now in October, 2011. Already the second half of 2011. Similar to Altera with 18 to 24 months of pre-announcement. When announced at 28nm, it is impressive, but with Moore's law and the time periods, by the time it ships the next node is past 28nm. I'm sure it is difficult to get this correct, but rather bring out eval board closer to announcement dates like Freescale and TI. I would like to use the chip but will see what the tools cost. Happy with TI and Freescale's devices at higher frequencies that are shipping already.
Sign in to Reply