Editor’s Note: This is the cover story from the latest issue of Xcell Journal. It is reproduced here with the kind permission of Xilinx.
Xilinx has just unveiled the first devices in a new family built around its Extensible Processing Platform (EPP), a revolutionary architecture that mates a dual ARM Cortex-A9 MPCore processor with low-power programmable logic and hardened peripheral IP all on the same device (see Cover Story
, spring 2010 issue of Xcell Journal). In March of this year, Xilinx officially announced the first four devices of what it has now dubbed the Zynq-7000 EPP family.
Implemented in 28-nanometer process technology, each Zynq-7000 device is built with an ARM dual-core Cortex-A9 MPCore processing system equipped with a NEON media engine and a double-precision floating-point unit, as well as Level 1 and Level 2 caches, a multi-memory controller and a slew of commonly used peripherals (Figure 1). While FPGA vendors have previously fielded devices with both hardwired and soft onboard processors, the Zynq-7000 EPP is unique in that the ARM processor system, rather than the programmable logic, runs the show. That is, Xilinx designed the processing system to boot at power-up (before the FPGA logic) and to run a variety of operating systems independent of the programmable logic fabric. Designers then program the processing system to configure the programmable logic on an as-needed basis.
Figure 1. Unlike previous chips that combine MPUs in an
FPGA fabric, Xilinx’s new Zynq-7000 EPP family
lets the ARM processor, rather than the
programmable logic, run the show.
With this approach, the software programming model is exactly the same as in standard, fully featured ARM processor-based systems-on-chip (SoCs). Previous implementations required designers to program the FPGA logic to get the onboard processor to work. That meant you had to be an FPGA designer to use the devices. This is not the case with the Zynq-7000 EPP.
The new product family eliminates the delay and risk of designing a chip from scratch, meaning system design teams can quickly create innovative SoCs leveraging advanced hardware and software programming versatility simply not achievable in any other semiconductor device. As such, the Zynq-7000 EPP stands poised to allow a broader number of innovators – whether they are professional hardware, software or systems designers, or simply “makers” – to explore the possibilities of combining processing plus programmable logic to create applications no one has yet imagined.
“At its most basic level, Zynq-7000 EPP is an entirely new class of semiconductor product,” said Larry Getman, vice president of processing platforms at Xilinx. “It is not just a processor and it is not just an FPGA. We are combining the best of both those worlds, and because of that we take away many of the limitations you have with existing solutions, especially two-chip solutions and ASICs.”
Getman notes that many electronic systems today pair an FPGA and either a standalone processor or an ASIC with an onboard processor on the same PCB. Xilinx’s new offering will allow companies using these types of two-chip solutions to build next-generation systems with just one Zynq-7000 chip, saving bill-of-material costs and PCB space, and reducing overall power budgets. And because the processor and FPGA are on the same fabric, the performance increase is immense.
Zynq-7000 EPP will also speed up the natural market migration from ASICs to FPGAs, Getman said. Implementing ASICs in the latest process technologies is too expensive and too risky for a growing number of applications. As a result, more and more companies are embracing FPGAs. Many of those attempting to hold onto their old ASIC ways are implementing their designs in older process geometries in what analysts call “value-minded SoC ASICs.” Any ASIC still requires lengthy design cycles and is at risk of multiple respins—which can be expensive and can delay products from going to market in a timely manner. “With Zynq-7000 EPP on 28 nm, the programmable logic portion of the device has no size or performance penalty vs. older technologies, and you also get the added benefit of a hardened 28-nm SoC in the processing subsystem. With a starting price point below $15, we are really making it hard for companies to justify the cost and risk of designing any ASIC that is not extremely high volume,” said Getman. “You can get your software and hardware teams up and running from day one. That alone makes it hard for engineering teams to justify staying with ASICs.”
Getman notes that ever since Xilinx announced the architecture last year, interest in and requests for the Zynq-7000 EPP have been remarkable. “A select number of alpha customers are already prototyping systems that will use Zynq-7000 devices. The technology is very exciting.”
Smart architectural decisions
The Zynq-7000 EPP design team, under the direction of Vidya Rajagopalan, vice president of processing solutions at Xilinx, designed a very well-thought-out architecture for this new class of device. Above and beyond choosing the ubiquitous and immensely popular ARM processor system, a key architectural decision was to extensively use the high-bandwidth AMBA Advanced Extensible Interface (AXI) interconnect between the processing system and the programmable logic. This enables multigigabit data transfers between the ARM dual-core Cortex-A9 MPCore processing subsystem and the programmable logic at very low power, thereby eliminating common performance bottlenecks for control, data, I/O and memory.
In fact, Xilinx teamed with ARM to make the ARM architecture an even better fit for FPGA applications. “AXI4 has a memory-mapped version and a streaming version,” said Rajagopalan. “Xilinx drove the streaming definition for ARM because a lot of IP that people build for applications such as high-bandwidth video is streaming IP. ARM didn’t have a product that had this streaming interface so they partnered with us to do it.”
Getman said another key aspect of the architecture is that Xilinx hardened a healthy mix of standard interface IP into Zynq-7000 EPP silicon. “We tried to choose peripherals that were more ubiquitous—things like USB, Ethernet, SDIO, UART, SPI, I2C and GPIO are all pretty standard,” said Getman. “The one exception is that we also added CAN to the device. CAN is one of the more specialized hardened cores, but it is heavily used in two of our key target markets: industrial and automotive. Having it hardened in the device is just one more comfort factor of the Zynq-7000 EPP.”
In terms of memory, Zynq-7000 devices offer up to 512 kbytes of L2 cache that is shared by both processors. “The Zynq-7000 EPP devices have 256 kbytes of scratchpad, which is a shared memory that the processor and FPGA can both access,” said Getman.
A unique multistandard DDR controller supports three types of double-data-rate memory. “Where most ASSPs target a particular segment of a market, we target LPDDR2, DDR2 and DDR3, so the user can make the trade-off of whether they want to go after power or performance,” said Rajagopalan. “It is a multistandard DDR controller and we are one of the first companies to offer a controller like this.”
In addition to being a new class of device, Zynq-7000 EPP is also the latest Xilinx Targeted Design Platform, offered with base development boards, software, IP and documentation to get customers up and running quickly. Further, the company will roll out over the coming years vertical-market- and application-specific Zynq-7000 EPP Targeted Design Platforms – boards or daughtercards, IP and documentation – all to help design teams get products to market faster (see Cover Story
, Xcell Journal Issue 68).
Xilinx Alliance Program members and the ARM Connected Community will also offer customers a wealth of Zynq-7000 EPP resources, including popular operating systems, debuggers, IP, reference designs and other learning and development materials.
In addition to creating great silicon and the tools to go with it, Xilinx has meticulously put together user-friendly design and programming flows for the Zynq-7000 EPP.
Processor-centric development flow
The Zynq-7000 EPP relies on a familiar tool flow that allows embedded-software and hardware engineers to perform their respective development, debug and implementation tasks in much the same way as they do now—using familiar embedded-design methodologies already delivered through the Xilinx ISE Design Suite and third-party tools (Figure 2).
Figure 2 – The Zynq-7000 EPP relies on a familiar tool
flow for system architects, software developers
and hardware designers alike.
Getman notes that software application engineers can use the same development tools they have employed for previous designs. Xilinx provides the Software Development Kit (SDK), an Eclipse-based tool suite, for embedded-software application projects. Engineers can also use other third-party development environments, such as the ARM Development Studio 5 (DS-5), ARM RealView Development Suite (RVDS) or any other development tools from the ARM ecosystem.
Linux application developers can fully leverage both the Cortex-A9 CPU cores in Zynq-7000 devices in a symmetric-multiprocessor mode for the highest performance. Alternatively, they can set up the CPU cores in a uniprocessor or asymmetric-multiprocessor mode running Linux, a real-time operating system (RTOS) like VxWorks or both. To jump-start software development, Xilinx provides customers with open-source Linux drivers as well as bare-metal drivers for all the processing peripherals (USB, Ethernet, SDIO, UART, CAN, SPI, I2C and GPIO). Fully supported OS/RTOS board support packages with middleware and application software will also be available from the Xilinx and ARM partner ecosystem.
Meanwhile, the hardware design flow is similar to the embedded-processor design flow in the ISE Design Suite, with a few new steps for the Extensible Processing Platform. The processing subsystem is a complete dual-processor system with an extensive set of commonly used peripherals. Hardware designers can extend the processing power by attaching additional soft-IP peripherals in the programmable logic to the processing subsystem. The hardware development tool Xilinx Platform Studio automates many of the common hardware development steps and can also assist designers with optimized device pinouts. “We’ve also added to ISE some abilities to co-debug for hardware breakpoints and cross-triggering,” said Getman. “The most important thing for us was to give software developers and hardware designers their comfortable design environments.”
A mindful programming methodology
In the Xilinx scheme of things, users can configure the programmable logic and connect it to the ARM core through AXI “interconnect” blocks to extend the performance and capabilities of the processor system. The Xilinx and ARM partner ecosystem provides a large set of soft AMBA interface IP cores for implementation in the FPGA programmable logic. Designers can use them to build any custom functions their targeted application requires. Because the device uses familiar programmable logic structures found in the 7 series FPGAs, designers can load a single static programmable logic configuration, multiple configurations or even employ partial reconfiguration techniques to allow the device to reprogram programmable logic functionality as needed on the fly.
The interconnect operation between the two regions of the device is largely transparent to the designers. Access between master and slave is routed through the AXI interconnect based on the address range assigned to each slave device. Multiple masters can access multiple slaves simultaneously, and each AXI interconnect uses a two-level arbitration scheme to resolve contention.
Get ready, get in early…
Customers can start evaluating the Zynq-7000 EPP family today by joining the Early Access program. First silicon devices are scheduled for the second half of 2011, with general engineering samples available in the first half of 2012. Designers can immediately use tools and development kits that support ARM to familiarize themselves with the Cortex-A9 MPCore architecture and begin porting code.
Pricing varies and depends on volume and choice of device. Based on forward volume production pricing, the Zynq-7000 EPP family will have an entry point below $15 in high volumes. Interested customers should contact their local Xilinx representative. For more information, please visit www.xilinx.com/zynq.
One Processing System, Four Devices
Each of the Zynq-7000 EPP family’s four devices has the exact same ARM processing system, but the programmable logic resources vary for scalability and fit different applications.
The Cortex-A9 Multi-Processor core (MPCore) consists of two CPUs – each a Cortex A9 MPCore processor with dedicated NEON coprocessor (a media and signal-processing architecture that adds instructions targeted at audio, video, 3-D graphics, image and speech processing) and a double-precision floating-point unit. The Cortex-A9 processor is a high-performance, low-power, ARM macrocell with a Level 1 cache subsystem that provides full virtual-memory capabilities. The processor implements the ARMv7 architecture and runs 32-bit ARM instructions, 16-bit and 32-bit Thumb instructions, and 8-bit Java byte codes in Jazelle state. In addition, the processing system includes a snoop control unit, a Level 2 cache controller, on-chip SRAM, timers and counters, DMA, system control registers, device configuration and an ARM CoreSight system. For debug, it contains an embedded trace buffer, instrumentation trace macrocell and cross-trigger module from ARM, along with AXI monitor and fabric trace modules from Xilinx.
The two larger devices, the Zynq-7030 and Zynq-7040, include high-speed, low-power serial connectivity with built-in multigigabit transceivers operating at up to 10.3125 Gbits/second. These devices offer approximately 1.9 million and 3.5 million equivalent ASIC gates (125k and 235k logic cells) respectively, along with DSP resources that deliver 480 GMACs and 912 GMACs respectively of peak performance. The two smaller devices, the Zynq-7010 and Zynq-7020, provide roughly 430,000 and 1.3 million ASIC-gate equivalents (30k and 85k logic cells) respectively, with 58 GMACs and 158 GMACs of peak DSP performance.
Each device contains a general-purpose analog-to-digital converter (XADC) interface, which features two 12-bit, 1-Msample/s ADCs, on-chip sensors and external analog input channels. The XADC offers enhanced functionality over the system monitor found in previous generations of Virtex® FPGAs. The two 12-bit ADCs, which can sample up to 17 external-input analog channels, support a diverse range of applications that need to process analog signals with bandwidths of less than 500 kHz.