Design Article
Reducing Power Consumption in a Fiber Channel Switch
Mitch Dale, Calypto Design Systems
9/9/2008 10:16 AM EDT
This article describes the efforts to reduce power by a leading supplier of networking and network storage equipment. This company offers a full range of switches and routers to serve the networking needs of businesses, from small companies to global service providers. Previously, new product development focused on higher bandwidth and expanded capabilities, such as security and quality of service. With an increasing emphasis on controlling energy costs, more and more of its customers are demanding improved power efficiency, making low power an important design consideration and competitive advantage.
At the core of the company's products are large system-on-chip (SoC) devices containing dozens of I/O ports that support multiple protocols connected through a high-speed switching matrix. All totaled, these devices incorporate millions of gates.
The engineering challenge is to develop these high-performance devices to consume the lowest power possible under worst-case, peak-loading conditions. This mandates a low-power design flow that scales to million-gate designs and reduces power when designs are fully active. The design flow must work with existing implementation tools and save power without negatively affecting performance or chip area.
Facing the Power Challenge
It is clear to designers that power dissipation has become a primary consideration, and that power optimization must take into account performance and area trade-offs. In the past, designers could count on register transfer level (RTL) synthesis to optimize for power. Power optimization was, therefore, restricted to combinational clock gating inserted by RTL synthesis tools. Today, however, meaningful reduction in power dissipation can only be attained at the architectural or register transfer levels of design.
Reducing power early in the RTL design process has a positive ripple effect throughout the entire system. It makes achieve timing closure easier. It simplifies packaging, lowers system costs and influences the product form factor. RTL is the best point in the design flow to optimize power because RTL optimizations have significant impact on power. Additionally, RTL descriptions have sufficient implementation information, making meaningful power and performance/area tradeoffs possible.
RTL Power Optimization for SoCs
Because of the complexity of the designs, the design team quickly determined that power optimization solutions requiring manual implementation were not feasible. The team needed a solution that could identify power savings opportunities, evaluate the cost tradeoffs, and implement power optimizations directly in the RTL.
After an extensive evaluation, the team chose Calypto's PowerPro CG, an automated RTL power optimization product that's been on the market for a year. This tool fits into existing RTL synthesis design flows such that the power savings are complimentary and cumulative to downstream tools.
The design team decided to first try the RTL power optimization software on the RTL design blocks that makes up a large, 20-million gate Fiber Channel Switching SoC design. At the start of the project, low-power RTL synthesis had been run on the RTL design blocks resulting in combinational clock gating on more than 740,000 of the 1.4-million flops in the Fiber Channel switch. In all, the tool was run on the six major sub-blocks (highlighted in green in the diagram). For each block, it was able to add additional sequential clock-gating enable conditions. In total, the tool gated an additional 180,851 flops. (See figure 1 and figure 1a.)

1. Architectural view of the switch.

1a. Comparison showing improvements obtained.
Sequential clock gating is different from combination clock gating in that it requires sequential analysis of circuit behavior across multiple clock cycles to identify clock-gating enable conditions. Sequential analysis is dependent on circuit functionality not simulation vectors to propagate enable conditions across sequential elements. In doing so, sequential clock gating finds enable conditions on registers not previously clock gated and strengthens enable conditions on registers previously gated, lengthening the duration clocks are disabled. Because sequential clock gating stops the propagation of unused data between registers, combinational logic power as well as clock and register power are reduced.
The power-optimized RTL was verified using a formal, sequential equivalence checker. This ensures that no functional changes were introduced by the additional enable logic by comprehensively verifying the power-optimized RTL code is functionally equivalent to the original RTL code. (See figure 2.) As part of the RTL power optimization run, sequential equivalence checking setup scripts were generated to automate setup and turned the verification of RTL power optimization results into a single command.

1. Design flow using sequential equivalence checker.
Next, the team took the output from the power optimization tool through low-power RTL synthesis. Since the tool maintains the original RTL coding style, synthesis pragmas and comments, there were no changes required to the existing synthesis environment. As a result, designers could easily identify the changes the power optimization software had made to their RTL code. All the compiled design blocks were stitched together to run gate-level simulations and generate power estimates. By comparing the power reports from the original RTL code and the power-optimized RTL code, the design team found an 11% power savings under peak traffic conditions and an 18% savings in idle mode. Area and timing for the design remained consistent with the original RTL code.
Enabling Higher Performance
Based on the success of the Networking Storage Group, a second design team decided to apply the automated RTL power optimization tool to a packet processing SoC targeted for mid-range routers. This design contained approximately 35,500 flops and 205,600 instances. Since this was a high-performance design, the requirement was to reduce power without compromising performance.
As with the Fiber Channel project, designers were easily able to use the information in their existing synthesis scripts to create the power optimization setup. Within a day, the second team was generating power-optimized RTL code.
The optimized RTL code was then taken through RTL synthesis, clock-tree synthesis and place-and-route in order to analyze power and performance. The result was a 12% overall savings in power, while meeting demanding performance requirements. The table below summarizes the results.
| 205,597 |
207133 |
0.7% |
|
| 205,597 |
207133 |
0.7% |
|
| 3,761,776 |
3,764,978 |
0.08% |
|
| 26,717,556 |
27,215,606 |
1.8% |
|
| 104.5 |
90.3 |
-13.6%% |
|
| 60.5 |
52.3 |
-13.55% |
|
| 20.4 |
20.6 |
0.01% |
|
| 185.4 |
163.2 |
-12% |
|
Conclusion
Power optimization for SoCs has been primarily accomplished using the built-in capability of today's RTL synthesis tools. RTL sequential clock gating offers an opportunity to further reduce power by finding more clock gating enable conditions. PowerPro CG from Calypto is an automated RTL power optimization solution that fits into existing RTL design flows whose results are cumulative and complimentary to low-power RTL synthesis. The Network Storage design teams validated results from this tool on two projects and, based on these successes, have incorporated the tool into their low-power design SoC design flow.
About the Author:
Mitch Dale is Director of Product Marketing at Calypto Design Systems. He can be reached at: mdale@calypto.com.



