SOFTWARE POWER OPTIMIZATION
You can optimize designs using block RAMs for power by minimizing the number of simultaneously active block RAM ports. This optimization, enabled with the –power yes option in XST, modifies the decomposition of RAM or ROM descriptions that span multiple block RAMs. The optimization adjusts address lines as well as port-enable and write-enable control signals to minimize the number of active block RAM ports at each clock cycle, while ensuring that your design meets timing constraints.
Next, force the most power-efficient mapping of block RAMs regardless of the impact on performance. Use the block_power2 option to the ram_style constraint when you know that the timing paths related to this memory are not critical. Savings range from 15 percent to 75 percent.
Also, use the Area Optimization mode in XST. This option minimizes the number of resources your design will use. Note that when optimizing for area, performance may suffer.
An additional tactic is to enable activity-aware optimizations, another way of saying intelligent gating. These algorithms analyze the logic equations to detect for each clock cycle sourcing registers that do not contribute to the result. The software then utilizes the abundant clock-enable (CE) resources available in the FPGA logic to create fine-grained gating signals that neutralize useless switching activity. You control this intelligent clock and data gating with the map -power high option. Total core dynamic power reduction in excess of 15 percent is possible and in most cases, the additional gating logic inserted does not affect performance.
Another way to design for power is to use capacitance-aware optimizations. There are two main techniques:
• Group clock loads
: This process reorganizes the placement of synchronous elements (such as flip-flops or DSP blocks) to minimize the reach of each clock net. When you place clock loads along a minimum number of horizontal or vertical clock spines, the software can disable unused branches in the clock region. This reduces both the clock resources and buffering requirements, which saves core dynamic power. This process is controlled by the map -power on option.
• Group data loads
: This algorithm minimizes the total wire length in your design while ensuring that you meet performance requirements. Grouping data loads saves power because dynamic power increases with the fanout and the type and length of routing structures you use. The grouping algorithm, likewise enabled with the map -power on option, achieves power reduction by placing related logic closer together.
The ISE® Design Suite features predefined goals and strategies that are already tuned to enable power optimization at synthesis, map and place-and-route levels. This approach may be a good alternative to using nondefault constraint settings of all synthesis constraints. However, running this option can add some delay time on various paths.
Finally, Xilinx implementation tools automatically shut off unused transceivers, phase-locked loops, digital clock managers and I/Os. In 7 series devices, Xilinx has also added power gating of unused block RAM. Leakage in block RAM occurs only in blocks that you are using for a particular design, and not for all block RAMs on the device. Power is routed in the device to the instantiated block RAM only, and disabled for the unused block RAMs.
Figure 4 – Xilinx has built design goals and strategies for minimizing power into the ISE Design Suite.
LOW-POWER DESIGN TECHNIQUES
There are many tips and techniques that designers can explore to lower the power of an FPGA design. One of the first options is to use dedicated hardware blocks rather than implementing the same logic in CLBs. To reduce power, you must look for opportunities to reduce the logic in the design. This will allow you to use as small a device as possible and reduce static power consumption.
Using dedicated hard-IP blocks is one of the most important ways to lower both static and dynamic power, as well as to easily meet timing. Hard IP lowers static power because the total transistor count is less than an equivalent component with CLB logic.
As a general rule, you should attempt to infer resources as much as possible. You can steer the inferred resources individually, or as a group, toward the FPGA fabric or silicon resource via attributes in the code or within a constraint file. You can also leverage the Xilinx CORE Generator™ tool to customize the dedicated hardware for instantiating a specific resource.
Moreover, you can employ unused hard IP cleverly for other tasks that may not be obvious. DSP48 slices serve many logic functions such as multipliers, adders/accumulators, wide logic comparators, shifters, pattern matchers and counters. You can use block RAMs as state machines, math functions, ROMs and wide logic lookup tables (LUTs).