For the reasons we discussed earlier, today's state-of-the-art SoCs and FPGAs contain an ever-growing number of IP cores, each of which is immersed in its own independent clock domain. When the IP core population increases and the minimum transistor size shrinks, it's very difficult to keep all the different clock domains synchronized across the die while maintaining the considerable data bandwidth between them required to keep pace with Moore's law.
One of the most promising solutions to this interconnect issue is the Globally Asynchronous, Locally Synchronous approach, a.k.a. GALS. This solution is based on considering each IP core in a design as forming an independent "synchronous island." In this way, each core can be implemented using standard synchronous design tools to a constrained region of the die. Once every synchronous component has been designed and located, an asynchronous Network-on-Chip (NoC) is deployed to efficiently convey data packets between the different cores as illustrated below:
In future columns, we will discuss how this asynchronous interconnect infrastructure can be efficiently implemented on a SoC, focusing on Xilinx FPGA devices as a practical example. For this purpose, we will only need to make use of a very special kind of building block: the micropipeline, which is a special form of event-driven, elastic data pipeline. Don't worry. I will demonstrate just how easy this can be in a COTS FPGA when using the appropriated design techniques and tools.
By using the GALS approach, we will be able to achieve the low power consumption, rugged behavior, low EMI, and speed advantages associated with asynchronous design techniques in an FPGA while still harnessing all the productivity of its well-known synchronous design toolchain.
But before we go there, let's pause for a moment to once again consider the physical issues pushing us toward this kind of implementation. As data communication has become the most critical part of a design to be optimized, a lot of attention must be devoted to the place-and-route portion of the workflow. Xilinx provides a lot of wonderful tools for controlling the physical implementation of an HDL description, including the PlanAhead floorplanning engine along with very accurate post-place-and-route simulation models. Using this design infrastructure is not only a must when implementing asynchronous designs on an FPGA, but also when optimizing for power, area, and speed in any complex synchronous design.
Now, are you using floorplanning tools and post-place-and-route simulation models, or are you still relying on a "single-click" automated flow? Have you ever been obliged to roll up your sleeves and control your design by hand as a hardware artisan? Or are you comfortable with the "HDL -- it's just a kind of software" point of view?