Editor's Note: This article was reproduced from Issue #82 of Xcell Journal with the kind permission of publisher Mike Santarini (Click Here to see the original article in an interactive online format).
The Xilinx Zynq-7000 All Programmable SoC already has plenty of processing power onboard. But the presence of powerful twin Cortex-A9 processors and associated peripherals in Zynq's application processing unit (APU) should not keep you from adding one or more MicroBlaze processors in the same package, if your application would benefit from them.
Why might you want to add a MicroBlaze to a solution already endowed with serious processing clout? First there is the issue of reliability. Single-threading dramatically improves reliability. You can cleanly place one thread per Cortex-A9 (for computationally intensive tasks), and instantiate as many MicroBlaze processors as you need for other threads. Second, you can farm out any housekeeping chores that don't require the power of a Cortex-A9 to a MicroBlaze, thus saving critical performance cycles for the jobs that need them most.
Here's an example that covers both of the above situations. Consider a task that requires long stretches of intense computing while monitoring user input. Here the MicroBlaze could manage the user input (lower frequency, non-computationally intensive) and write into the APU's memory space so that when the APU "comes up for air" – that is, completes its processing task – It can see what information it needs to process next.
Once you've made the decision to include a MicroBlaze processor in your Zynq-based design, several issues become immediately apparent. First and foremost is the question of how the APU will communicate with the MicroBlaze, and what processing system (PS) resources are available for the MicroBlaze. Many boards, such as the ZC702 and Zedboard, map many of the peripherals directly to the pins connected to the PS. These pins are not directly accessible to the MicroBlaze in the programmable logic (PL). The PS also contains a variety of timers and interrupt sources. Is there any way to access them from the domain of the MicroBlaze?
Interfaces between the PS and PL
Figure 1. Is the boundary between the PS and the MicroBlaze
within the PL a minefield, or can the two share resources?
The processor system and the programmable logic are well coupled. This means that there are multiple tightly integrated connections be¬tween the Cortex-A9s, snoop control unit (SCU), PS peripherals, clock management and other functions, and the programmable logic. In fact, there are six different types of interconnects between the PS and the PL, and you can use them in conjunction with one another. Additionally, many of these paths are symmetric—that is, the PS can initiate or "master" connections to the PL and the PL can master connections to the PS.
Much of the information presently available from Xilinx, from app notes to user guides and white papers, illustrates how the Zynq-7000 APU, as the "center" of the design, can use the programmable logic to access memory, PL-based peripherals and hard silicon peripherals such as the PCIe block, Block RAMs, DSP48s and multigigabit transceivers. In examining how the MicroBlaze can be the captain of its domain, the logical place to begin is by looking at the six interface varieties, starting with three types of AXI interfaces: general purpose, high performance and the Accelerator Coherency Port.
The PS is equipped with two master AXI channels to the PL and two slave channels mastered by the PL (Figure 2). "Master" in this context means that the AXI channel is the initiator and can begin data exchanges, whereas a "slave" can only respond to arriving data. The master AXI channels are typically used to communicate with peripherals located in the PL. The slave AXI channels respond to requests made from the PL, which can include transactions made by MicroBlaze processors. These AXI channels tie into the central interconnect of the PS and can be routed to many resources.
Figure 2. Simplified connections to the processing
system’s central interconnect.
In addition, there are four channels of high-performance (64-bit-wide) AXI attachment points. All four of these channels are slaves from the PS' perspective and are connected to the memory interface subsystem within the PS (Figure 3). The purpose of these four channels is to allow masters in the PL to initiate double-data-rate (DDR) memory transactions.
Figure 3. Simplified connections to the DDR memory
controller and on-chip memory (OCM).
This memory interconnect and DDR memory controller are the gateways to the DDR memory from all sources. While the Cortex-A9 processors usually have priority over the slave AXI connections, each one of the four slave AXI connections has a "service me now" signal that gives priority to the requesting channel. When this signal is not asserted, the architecture uses a round-robin scheme to determine which requestor can gain access to the specific type of memory.
The Accelerator Coherency Port (ACP) is another 32-bit AXI PS slave connection from the PL. What makes the ACP unique is that it is tied directly into the snoop control unit (SCU). The job of the SCU is to ensure coherency among the L1, L2 and DDR memories. Using the ACP, you can access the fast cache memory for each of the Cortex-A9 processors in the PS and not be concerned with synchronizing data with the main memory (as the hardware will automatically take care of this). This capability greatly reduces the burden of design and provides a significantly faster way of moving data between the processors and the PL.
Beyond AXI links, the Extended Multiplexed Input and Output (EMIO) signals are available for routing many of the PS' hard peripherals through the PL to access the package pins. There are only 54 package pins tied directly to the PS; however, the PS' hard peripherals can use considerably more than these 54 pins. The EMIO is the conduit between the PS' hard peripherals and the PL. These I/O signals can be routed directly to the package pins available to the PL. Alternatively, you may use them to communicate with a compatible peripheral located in the PL.
Another variety of miscellaneous signals between the PS and PL falls into five basic categories: clocks and resets; interrupt signals; event signals; idle AXI; DDR memory signals; and DMA signals.
- Clocks and resets: There are four independent programmable frequencies that the PS makes available to the PL. Typically one of these clocks is used for the AXI connections. Each of these clock domains has its own domain reset signals for resetting any device associated with that domain.
- Interrupt signals: The general interrupt controller (GIC) in the PS collects interrupts from all available sources, including all of the interrupt sources from the PS' peripherals and 16 "peripheral" type interrupts from the programmable logic. Additionally, there are four direct interrupts that tie to the CPUs (IRQ0, IRQ1, FIQ0 and FIQ1). A total of 28 interrupts (from the PS' peripherals) are available to the PL.
- Event signals: These "out-of-band" asynchronous signals indicate a special condition of the PS. The PS provides a number of signals that indicate which CPU has entered a standby mode and which CPU has executed a SEV ("send event") instruction. The PS can leverage an event signal to wake from a WFE ("wait for event") state.
- Idle AXI and DDR memory signals: The idle AXI signal to the PS is used to indicate that there are no outstanding AXI transactions in the PL. Driven by the PL, this signal is one of the conditions used to initiate a PS bus clock shutdown by ensuring that all PL bus devices are idle. The DDR urgent/arb signal is used to indicate a critical memory-starvation situation to the DDR arbitration for the four AXI ports of the PS DDR memory controller.
- DMA signals: The direct-memory-access module within the PS communicates with the PL slaves via a series of request-and-acknowledge signals.