I think we're close on our definitions. I am thinking of any accesses to the stack carried out by code running on an ARM system which complies with the ABI. That covers parameters, spills, automatic variables, caller/callee-saved registers etc.
The ABI says that the stack pointer must be word aligned at all times (and doubleword-aligned at external boundaries). It doesn't actually say that you can't push/pop two halfwords at once in a pair of atomic operations but doing so would be impractically difficult while sticking to the ABI.
Yes, you can use halfword memory accesses indexed via SP, in the sense that the instruction set permits it. But it isn't possible (or at least practical) to do so in a way which doesn't violate the ABI.
The ABI for AArch64 specifies quadword alignment for SP at all times (whether externally visible or not) so, although instructions may exist for sub qword stack accesses, they aren't practically usable in this context.
I think we may have different definitions of spilling. I think of spilling as any moving of a register value into memory (e.g., due to register pressure). I am guessing that you may mean something else, perhaps saving callee save registers (where the callee cannot conveniently know the size of register contents nor if the value already has a slot allocated in a previous stack frame--interprocedural optimization might be able to discover such).
I also do not understand your statement "All ARM stack accesses are 32-bit" since ARM provides LDRH/STRH using the stack pointer, which is just a GPR afterall (I doubt even AArch64--which makes SP a non-GPR--prohibits sub-word accesses using SP). (Pushing and popping smaller values would be problematic in making SP unaligned.)
By the way, my gmail.com address is 'paaronclayton'.
Thanks for the response.
I was referring to any spilling of variables onto the stack. All ARM stack accesses are 32-bit so any spilled variable (or parameter, or variable allocated to the stack) takes up a full word.
To my knowledge, the register allocator does not take this into account when allocating registers to variables within procedures. If it is possible to save/spill a pair of variables using LDRD/STRD, that is sometimes down to serendipity as I understand it (some forms of these instructions require that the registers be a consecutive odd/even pair).
You are right that you don't need to stick to the ABI for internal functions. Not doing so is obviously potentially dangerous, as I'm sure you are aware!
Leaving the stack aligned to anything less than a word boundary when interrupts are enabled can be especially perilous.
I based my comment on the statement "Remember, too, that local variables, regardless of size, always take up an entire 32-bit register when held in
the register bank and an entire 32-bit word in memory when spilled on to the stack." (page 4 of "Efficient C Code for ARM Devices")
If it meant callee spilling, I could understand the constraint. (This limitation could motivate a compiler optimization that would preferentially allocate 32-bit values into callee save registers.) I could also understand how such could make debugging easier. (Also on ARM, code density--or even performance as such has sometimes been implemented using paired word operations--goals might promote use of store/load multiple word.)
(The ABI forcing such expansion for function parameters may be a concession to simplify debuggers or perhaps compilers. In theory, one does not need to use the ABI, at least for internal functions.)
I am the author the first of those papers which Brian cited. Glad you found it interesting.
I'm interested in your comment about local variables being expanded to 32-bit in the cache. Can you expand on that a bit more because I don't believe it has to be that way.
The former paper was somewhat interesting (I was surprised that 16-bit local variables would be expanded to 32-bit even in the cache) and points to some unfortunate limits of C and its compilers.
The latter article was more focused on the specific topic of exploiting the benefits of MIPS MT. I had already understood the principles, but the examples were interesting.
One problem seems to be that this information is scattered. Because the information content is vast and has complex interconnection, it seems that something like a wiki could be useful. Such a project would be outside the scope of EE Times (alone).
I do not know that such would be useful to anyone. Since I am just an information junkie, my feelings should have little weight.
Yes, it must be difficult for professionals to handle so much complexity (made worse by communication barriers even within organizations)--and with severe time limits and pressure to predict the result more than a year in advance. I am just a thinker (not even an academic), and even the limited complexity of which I am aware makes my head hurt (almost literally).
I did run two articles on software and power a couple of weeks ago:
Efficient C code for ARM devices http://eetimes.com/design/eda-design/4370230/EDADL-Efficient-C-code-for-ARM-devices?
Optimizing performance, power, and area in SoC designs using MIPS® multi-threaded processors http://eetimes.com/design/eda-design/4370392/Optimizing-performance--power--and-area-in-SoC-designs-using-MIPS--multi-threaded-processors?
While this article focuses on low-level techniques--as reasonable coming from someone at Synopsis--, there might be interest in overviews of higher level (architectural, microarchitecture, and software) techniques.
Techniques like approximate computation (mainly for audio/visual but also sometimes applicable to sensor data analysis) and analog computation (as in Lyric Semiconductor's error correction technology) seem to show some promise. (These can also apply to predictive structures like branch predictors.)
Asynchronous design, "Power Balanced Pipelines" (Sartori et al.), and other general microarchitectural techniques look interesting (at least to someone with an academic interest in computer architecture).
Techniques to improve performance can also improve power efficiency.
Software techniques can include optimizations to improve cache utilization (code density and code and data layout can help) and the scheduling of work to reduce the number of power transitions.
Software optimizations which improve performance can also improve power efficiency by avoiding unnecessary work and improving hurry-up-and-go-to-sleep effectiveness.
Even the little I have read in this area indicates that there are a lot of interesting techniques for managing power use.
I think what you are pointing out is that so many issues associated with complete product design are interrelated and that the consumption of power and the removal of the heat it generates impacts every facet of system design. Thanks for adding some of those dependencies.
Replay available now: A handful of emerging network technologies are competing to be the preferred wide-area connection for the Internet of Things. All claim lower costs and power use than cellular but none have wide deployment yet. Listen in as proponents of leading contenders make their case to be the metro or national IoT network of the future. Rick Merritt, EE Times Silicon Valley Bureau Chief, moderators this discussion. Join in and ask his guests questions.