Design Article

Reliable programming in ARM assembly language

Greg Davis, Green Hills Software

11/4/2010 11:31 AM EDT

Sometimes it's necessary to use both assembly and high-level programming languages when working in the ARM architecture. This paper from ARM TechCon explains why and how.
This article is from class ATC-150 at ARM Technology Conference. Click here for more information about the conference.
The ARM architecture, like most 32-bit architectures, is well-suited to a using a C or C++ compiler. The majority of control code is written using high-level programming languages like C and C++ instead of assembly language. There are good reasons for this. High-level programming languages are inherently safer and less error prone than programming in assembly. Code written in high-level programming languages can also be written to be portable across different architectures.

Some people use assembly language for writing device drivers, but this is usually unnecessary. Most device driver code can be written by mapping a C structure or a C++ class onto the hardware device. However, it is sometimes necessary to use a little bit of assembly code. This paper will describe how to best do this.

What is assembly
Assembly, or assembly code, is roughly used to refer to the instruction set that runs on the target processor. In reality, processors read a sequence of binary words that encode the instructions. Assembly is a step up from binary in that the instructions can be expressed in a human-readable form.

For example, consider a simple C function:
int add2(int a, int b)
{
return a + b;
}
When compiled, this code will likely turn into something like the following assembly code:
    .text 
.global add2
add2:
add r0, r0, r1
bx lr
.type add2, function
.size add2, 8
When assembled, this assembly turns into the following two binary words:
    0xe0810000 
0xe12fff1e
An assembler is the tool that converts assembly code into binary. However, the assembler doesn't output binary directly. Instead, it encapsulates the binary into an ELF object file that is usable by a linker.

Unfortunately, on the ARM architecture there is no standard format for assembly language. The ARM tools use a unique syntax that, although expressive, does not resemble the format used by most assemblers. Most assemblers use a format similar to the UNIX and GNU assemblers. The remainder of this paper will use examples in the UNIX style of assembly for ARM.

Types of assembly
There are three primary ways to write assembly: intrinsic functions, inline assembly, and assembly files.

Intrinsic functions are functions that have a special meaning to the implementation. Many intrinsic functions are used to provide assembly functionality to the user.

Consider an operation that is not efficiently expressible in C. We will use the ARM CLZ instruction as such as example. This instruction returns the number of zeros starting from the most significant bit of the source.

To code this operation in C, it might look something like this:
static int count_leading_zeros(uint32_t src)
{
for (int i = 31; i >= 0; i--) {
if ((src & ((uint32_t)1 << i)) != 0) {
return 31-i;
}
}
return 32;
}
Or it could be coded like this:
static int count_leading_zeros(uint32_t src)
{
uint32_t bit = 0x80000000;
int i = 0;
while (bit != 0) {
if ((src & bit) != 0) {
break;
}
i++;
bit >>= 1;
}
return i;
}
This could be coded any number of different ways, and it's not clear which way is the most natural. It is probably a matter of personal preference. It is a cumbersome matter for the compiler to recognize just one of these forms. So, it would be nearly impossible to recognize all of the ways that a human might express the operation. As a result, it would be nearly impossible for the compiler to automatically identify opportunities to substitute the CLZ instruction for loops that are functionally equivalent unless a standard way of writing the loop was established.

Rather than documenting a standard way to write this loop, an elegant solution is to document an intrinsic function that the compiler will recognize and convert to the CLZ instruction. For example, a compiler might recognize:
int __CLZ32(uint32_t src);
as an intrinsic function corresponding to the ARM CLZ instruction. So, whenever the user calls __CLZ32(),rather than actually making a function call to some routine, the compiler will inline a CLZ instruction in place of the call.

An intrinsic function is known to the compiler, and the compiler can understand the form of the resulting assembly code. Semantically, this behaves as one of the above functions would, but the implementation is streamlined. An intrinsic function provides a natural and efficient interface to assembly instructions.

The second approach is inline assembly. One might write something like:
static int count_leading_zeros(uint32_t src)
{
int ret;
asm("CLZ ret, src");
return ret;
}
To properly handle this, the compiler must understand a few things. For example, it must understand that this represents an instruction that has two operands. The first operand is a destination that is written by the instruction, the operand must be allocated to a register, and the operand should correspond to the "ret" variable. The second operation is a source that is read by the instruction, the operand must be allocated to a register, and the operand should correspond to the "src" parameter. This instruction does not read or write any other objects other than the operands. Such built-in knowledge can not be complete. There must be limits to the types of assembly that the compiler can safely recognize, but these limitations are hard to describe. May the user define a label in one instance of inline assembly and branch to it from another? Which temporary registers may the user modify? Must all input operands be in unique registers even if the values of these operands are the same?



Next: Page 2




andyneil

11/9/2010 4:33 AM EST

Excellent article!

It seems to me that the vast majority of questions about *inline* assembler (ie, within HLL source text) in online forums are ill-informed (or uninformed) stumblings from novices - and, thus, are almost always seriously flawed.

Good to see the inherent risks, and the alternatives, clearly explained!

Sign in to Reply



Herman Roebbers

11/10/2010 5:23 AM EST

Very good article.
I do have some remarks on the register namings, however.
R0-R3,R14 are temp registers, but not callee-saved.
As the name implies callee-saved registers are the registers saved by the function being called, so that would be R4, R5, R6, R7, R8, R9, R10, and R11. These would only be saved by the callee if it actually needed to use them. And so they are not caller-saved.

Sign in to Reply



David Brown

11/11/2010 8:17 AM EST

I disagree on much of this article.

Intrinsic functions often a good choice if the compiler happens to define one - typically, they are available as wrappers for single assembly instructions like CLZ. If you need more than one assembly instruction, it is unlikely that there is a matching intrinsic.

I have also found cases where intrinsic functions were implemented inefficiently by the compiler - the compiler did a better job when given inline assembly than when using intrinsics.

It is rare that a long function is best written in assembly - compilers will often do a better job than an assembly programmer because it can (amongst other things) track register and stack usage for optimisations that would be too time consuming to write by hand.

So in most cases, you only need small sections of assembly - perhaps between 1 and 4 instructions. You can't use intrinsics if they don't exist for the code you want. External assembly modules means a lot of extra effort, and it means function call overheads - a big waste of time and space. And because the code is a black box as far as the compiler is concerned, it can't use IPA or global optimisation to improve the code.

Inline assembly fixes these issues. I don't know about Green Hills compilers, but gcc will happily inline and optimise inline assembly code, and will optimise the C code around it. The "correct" way to write the count_leading_zeros function with gcc is:

static inline int count_zeros(uint32_t src) {
int ret;
asm(" clz %[ret], %[src] "
: [ret] "=r" (ret) : [src] "r" (src));
return ret;
}

It's true that there is a learning curve for the syntax - but there is a learning curve for writing assembly modules too. And while it's also true that the documentation of the syntax in the gcc manuals could be clearer, there are endless examples, tutorials and resources on the web.

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)