Blog
Comment
cshore
I think we're close on our definitions. I am thinking of any accesses to the ...
Paul A. Clayton
I think we may have different definitions of spilling. I think of spilling as ...
Guest editorial: Low power is everywhere
Mary Ann White
4/18/2012 11:22 AM EDT
Top Challenges
As part of the global survey, Synopsys asked what are some of the top primary challenges in the design flow with the results shown in the graph below. Timing closure is always the top design challenge, but power management has quickly risen to become #2 where this might not have been included as a primary challenge as recently as 5 years ago.

Synopsys
also asked designers to tell us which power management task presented
the greatest challenge in their design flow. Since multi-voltage design
remains relatively new as an overall power savings technique, it does
pose a set of challenges for deployment in a typical design methodology.

Power
intent-based flows help automate the implementation of power management
techniques. Power intent includes the specification of multiple
voltage power domains, power shutdown modes, isolation, voltage level
shifting and retention behavior. Power intent is captured as a
companion file to the RTL or gate level design using the standardized
IEEE 1801 Unified Power Format (UPF). Early definition of power intent
in the design flow enables downstream tasks in the process to be
automated and driven by a consistent power specification. Used in
conjunction with the RTL or gate netlist of a design, UPF is used
systematically throughout the design process to describe the design’s
power intent.

Synopsys’ global user survey data shows that most
digital designers are now using a power intent-based methodology and
even more so are planning to deploy it over the next 2 years.
Optimized engines delivers on rapid time-to-market and quality of results
Power is now a primary requirement for almost all designs and is no longer limited to mobile applications anymore. The power techniques will continue to evolve as technology processes continue to shrink and new design challenges surface.
Consumer demand for new gadgets requires rapid time to market with limited product lifespan. The EDA tool selection for developing low power SoCs is based on the ability to deliver the best quality of results (QoR) with predictable results that increase design productivity.
Synopsys’ advanced low power solution provides a comprehensive, silicon-proven approach to low power design which includes power-awareness built in at every stage of the design cycle. The advanced low power solution, featuring the GalaxyTM Implementation Platform, offers all of the low power design techniques to deliver maximum power savings. Integrated throughout, Synopsys can increase productivity by providing predictable results early on in the design cycle.
With over 15 years of proven low power innovations, Synopsys will continue to invest in providing advanced solutions for the latest in low power design techniques.

About the Author
Mary Ann White is the product marketing director for GalaxyTM Implementation Platform products at Synopsys. She has more than 25 years of experience working in the EDA and semiconductor industries. White has a BS EECS degree from UC Berkeley.
Sources:
Synopsys, Inc. Global User Survey, 2011
http://newsroom.intel.com/docs/DOC-2032
http://www.soiconsortium.org/index.php
http://www.tsmc.com/english/dedicatedFoundry/technology/28nm.htm
Other pages you may be interested in: Power 101 series
This posting is part of the EDA Designline power series and is archived and updated. The root is accessible here. Please send me any updates, additions, references, white papers or other materials that should be associated with this posting. Thank you for making this a success - Brian Bailey.
As part of the global survey, Synopsys asked what are some of the top primary challenges in the design flow with the results shown in the graph below. Timing closure is always the top design challenge, but power management has quickly risen to become #2 where this might not have been included as a primary challenge as recently as 5 years ago.



Optimized engines delivers on rapid time-to-market and quality of results
Power is now a primary requirement for almost all designs and is no longer limited to mobile applications anymore. The power techniques will continue to evolve as technology processes continue to shrink and new design challenges surface.
Consumer demand for new gadgets requires rapid time to market with limited product lifespan. The EDA tool selection for developing low power SoCs is based on the ability to deliver the best quality of results (QoR) with predictable results that increase design productivity.
Synopsys’ advanced low power solution provides a comprehensive, silicon-proven approach to low power design which includes power-awareness built in at every stage of the design cycle. The advanced low power solution, featuring the GalaxyTM Implementation Platform, offers all of the low power design techniques to deliver maximum power savings. Integrated throughout, Synopsys can increase productivity by providing predictable results early on in the design cycle.
With over 15 years of proven low power innovations, Synopsys will continue to invest in providing advanced solutions for the latest in low power design techniques.

About the Author
Mary Ann White is the product marketing director for GalaxyTM Implementation Platform products at Synopsys. She has more than 25 years of experience working in the EDA and semiconductor industries. White has a BS EECS degree from UC Berkeley.
Sources:
Synopsys, Inc. Global User Survey, 2011
http://newsroom.intel.com/docs/DOC-2032
http://www.soiconsortium.org/index.php
http://www.tsmc.com/english/dedicatedFoundry/technology/28nm.htm
Other pages you may be interested in: Power 101 series
This posting is part of the EDA Designline power series and is archived and updated. The root is accessible here. Please send me any updates, additions, references, white papers or other materials that should be associated with this posting. Thank you for making this a success - Brian Bailey.
Navigate to related information


Paul A. Clayton
4/19/2012 5:41 PM EDT
A few other reasons that power use can be important:
*form factor (lowering cooling requirements can facilitate a lighter, smaller, and/or less exposed system; this is a factor with servers where data center space costs money as well as consumer electronics and deeply embedded systems)
*product cost (larger heat sinks add cost, fans add cost--material (including inventory) and assembly--, heat management adds design cost)
*reliability (keeping temperatures down reduces soft errors and hard failures, in addition power saving techniques like DVFS can also increase MTTF by reducing electromigration et al.; removal of active cooling can remove a point of failure--particularly one with a moving part--, even reducing active cooling requirements can improve resilience; tighter integration facilitated by lower power can also reduce vulnerability to mechanical stresses and external electromagnetic interference)
*performance (when performance is limited by TDP, energy-efficiency can increase performance; in addition, if the number of external connections (pins) for power and ground can be reduced, more pins can be available for signals increasing available signal bandwidth; lower power can also facilitate tighter integration which can improve latency and bandwidth)
There is also a distinction between chemical batteries and other power sources. Energy harvesting techniques and radioisotope power cells have different constraints than chemical batteries.
I realize that including all of the above in the introduction would have added too much length, but it is easy to forget how multifaceted power concerns are.
Sign in to Reply
BrianBailey
4/19/2012 5:58 PM EDT
I think what you are pointing out is that so many issues associated with complete product design are interrelated and that the consumption of power and the removal of the heat it generates impacts every facet of system design. Thanks for adding some of those dependencies.
Sign in to Reply
Paul A. Clayton
4/21/2012 6:33 PM EDT
Yes, it must be difficult for professionals to handle so much complexity (made worse by communication barriers even within organizations)--and with severe time limits and pressure to predict the result more than a year in advance. I am just a thinker (not even an academic), and even the limited complexity of which I am aware makes my head hurt (almost literally).
Sign in to Reply
Paul A. Clayton
4/19/2012 7:16 PM EDT
While this article focuses on low-level techniques--as reasonable coming from someone at Synopsis--, there might be interest in overviews of higher level (architectural, microarchitecture, and software) techniques.
Techniques like approximate computation (mainly for audio/visual but also sometimes applicable to sensor data analysis) and analog computation (as in Lyric Semiconductor's error correction technology) seem to show some promise. (These can also apply to predictive structures like branch predictors.)
Asynchronous design, "Power Balanced Pipelines" (Sartori et al.), and other general microarchitectural techniques look interesting (at least to someone with an academic interest in computer architecture).
Techniques to improve performance can also improve power efficiency.
Software techniques can include optimizations to improve cache utilization (code density and code and data layout can help) and the scheduling of work to reduce the number of power transitions.
Software optimizations which improve performance can also improve power efficiency by avoiding unnecessary work and improving hurry-up-and-go-to-sleep effectiveness.
Even the little I have read in this area indicates that there are a lot of interesting techniques for managing power use.
Sign in to Reply
BrianBailey
4/20/2012 9:55 AM EDT
I did run two articles on software and power a couple of weeks ago:
Efficient C code for ARM devices http://eetimes.com/design/eda-design/4370230/EDADL-Efficient-C-code-for-ARM-devices?
and
Optimizing performance, power, and area in SoC designs using MIPS® multi-threaded processors http://eetimes.com/design/eda-design/4370392/Optimizing-performance--power--and-area-in-SoC-designs-using-MIPS--multi-threaded-processors?
Sign in to Reply
Paul A. Clayton
4/21/2012 7:09 PM EDT
The former paper was somewhat interesting (I was surprised that 16-bit local variables would be expanded to 32-bit even in the cache) and points to some unfortunate limits of C and its compilers.
The latter article was more focused on the specific topic of exploiting the benefits of MIPS MT. I had already understood the principles, but the examples were interesting.
One problem seems to be that this information is scattered. Because the information content is vast and has complex interconnection, it seems that something like a wiki could be useful. Such a project would be outside the scope of EE Times (alone).
I do not know that such would be useful to anyone. Since I am just an information junkie, my feelings should have little weight.
Sign in to Reply
cshore
4/23/2012 12:00 PM EDT
I am the author the first of those papers which Brian cited. Glad you found it interesting.
I'm interested in your comment about local variables being expanded to 32-bit in the cache. Can you expand on that a bit more because I don't believe it has to be that way.
Sign in to Reply
Paul A. Clayton
4/23/2012 12:46 PM EDT
I based my comment on the statement "Remember, too, that local variables, regardless of size, always take up an entire 32-bit register when held in
the register bank and an entire 32-bit word in memory when spilled on to the stack." (page 4 of "Efficient C Code for ARM Devices")
If it meant callee spilling, I could understand the constraint. (This limitation could motivate a compiler optimization that would preferentially allocate 32-bit values into callee save registers.) I could also understand how such could make debugging easier. (Also on ARM, code density--or even performance as such has sometimes been implemented using paired word operations--goals might promote use of store/load multiple word.)
(The ABI forcing such expansion for function parameters may be a concession to simplify debuggers or perhaps compilers. In theory, one does not need to use the ABI, at least for internal functions.)
Sign in to Reply
cshore
4/26/2012 8:59 AM EDT
Thanks for the response.
I was referring to any spilling of variables onto the stack. All ARM stack accesses are 32-bit so any spilled variable (or parameter, or variable allocated to the stack) takes up a full word.
To my knowledge, the register allocator does not take this into account when allocating registers to variables within procedures. If it is possible to save/spill a pair of variables using LDRD/STRD, that is sometimes down to serendipity as I understand it (some forms of these instructions require that the registers be a consecutive odd/even pair).
You are right that you don't need to stick to the ABI for internal functions. Not doing so is obviously potentially dangerous, as I'm sure you are aware!
Leaving the stack aligned to anything less than a word boundary when interrupts are enabled can be especially perilous.
Chris
Sign in to Reply
Paul A. Clayton
4/26/2012 1:22 PM EDT
I think we may have different definitions of spilling. I think of spilling as any moving of a register value into memory (e.g., due to register pressure). I am guessing that you may mean something else, perhaps saving callee save registers (where the callee cannot conveniently know the size of register contents nor if the value already has a slot allocated in a previous stack frame--interprocedural optimization might be able to discover such).
I also do not understand your statement "All ARM stack accesses are 32-bit" since ARM provides LDRH/STRH using the stack pointer, which is just a GPR afterall (I doubt even AArch64--which makes SP a non-GPR--prohibits sub-word accesses using SP). (Pushing and popping smaller values would be problematic in making SP unaligned.)
By the way, my gmail.com address is 'paaronclayton'.
Sign in to Reply
cshore
4/27/2012 10:18 AM EDT
I think we're close on our definitions. I am thinking of any accesses to the stack carried out by code running on an ARM system which complies with the ABI. That covers parameters, spills, automatic variables, caller/callee-saved registers etc.
The ABI says that the stack pointer must be word aligned at all times (and doubleword-aligned at external boundaries). It doesn't actually say that you can't push/pop two halfwords at once in a pair of atomic operations but doing so would be impractically difficult while sticking to the ABI.
Yes, you can use halfword memory accesses indexed via SP, in the sense that the instruction set permits it. But it isn't possible (or at least practical) to do so in a way which doesn't violate the ABI.
The ABI for AArch64 specifies quadword alignment for SP at all times (whether externally visible or not) so, although instructions may exist for sub qword stack accesses, they aren't practically usable in this context.
Sign in to Reply