Design Article
Expressive vs. permissive languages: Is that the question?
Yannick Moy
4/7/2010 1:00 AM EDT
Static analysis is becoming mainstream, with mature bug-finding tools for C and Java, including products such as Coverity Prevent, Grammatech CodeSonar, and Fortify SCA. These products limit the level of "noise" (false warnings) inherent to such tools to a minimum. However, by carefully selecting those cases for which they report a problem, these bug-finders hide the fact that they are largely uncertain about the overall correctness of the program.
To see this, it is usually sufficient to lower the ranking threshold below which problems are not reported, which immediately returns an extremely long list of possible problems, representing only a fraction of all the potential problems. By "problem" here, I mean any software bug, whether it's an error possibly detected at runtime (leading to an exception) or a mismatch between the programmer's intent and the obtained behavior.
As a developer and user of such tools, I have tried many times to answer a recurring question that most users have: how can I make my program analyzable? To tell the truth, beyond a few tips on features to avoid and idioms to prefer, usually not much can be done because so many decisions are outside the control of the programmer and instead depend upon the features of the programming language itself.
Therefore, the real question is: which language should I choose to make my program analyzable? This can be rephrased as: what does a given programming language offer me "for free"? By free, I mean information available from the source code without further analysis-information that, although not strictly required to express functional behavior, nonetheless captures part of the programmer's intention. Because software analysis is always short of information about the programmer's intent, the information that comes "for free" in a programming language is the most valuable asset available to an analyzer. By this criterion, C ranks quite below Java, which ranks quite below Ada.
Of course, tools for all languages strive to recover missing information, but as layers of missing information pile up, tools struggle harder to recover higher layers. For example, tools that analyze programs in assembly language are largely limited to recovering only declared types and control-flow, whereas tools that analyze C programs more easily recover higher layers too, such as type-safety and memory-safety properties. Similarly, tools that analyze Java programs can often deduce integer ranges and non-nullity of pointers, in addition to the properties recovered from C programs. Tools that analyze Ada programs are best at also recovering "function contracts," the ultimate goal in software analysis, because so much of this free information is available.
A function contract is made up of a precondition that a caller of the function should satisfy prior to the call and a postcondition that the function should satisfy prior to returning. While user-defined contracts have appeared initially in some pioneering languages (especially Eiffel), they're now available in various widely recognized environments, for example in the .NET platform (Code Contracts) and the GNAT GCC compiler for Ada. Internally, many analyzers are based on generated contracts, also called function summaries, that they compute during the analysis.
I said contracts are the ultimate goal because complete contracts allow analyzing each function in isolation, much like function signatures allow separate compilation. However, only one programming language, SPARK, has gone as far as to require contracts from the programmer. So, in the general case, it's up to the analyzer to generate the contracts.




Comments
Lundin
4/7/2010 11:35 AM EDT
The concept of contracts is definitely interesting, and perhaps something new programmming languages for embedded systems should consider adapting.
Though note that most of the concerns in this article are completely irrelevant to embedded systems. Embedded designers strive to make their products idiot-proof, rather than hacker-proof. So in embedded systems you will typically find the number of attackers == null. The number of Java users are also null. Dynamic allocation is used sparsely. And I'd rather not have my car call the exit(1) function, i.e. "let go of the wheel and cover your eyes with both hands upon error".
For the C example, the article should in all fairness compare apples and apples. The equivalent C snippet to the Ada example would look like this:
In this example written by a C programmer, config is initialized to NULL and there is no risk for integer overflow. If you want it idiot-programmer-proof, you can also add a check if(conf==NULL). I'm also really curious of knowing how a parameter to a function can go out of scope before the function is called.
Edited by: ESD editorial staff: SRambo on Apr 7, 2010 11:54 AM
Sign in to Reply
yannick.moy
4/8/2010 10:47 AM EDT
I think my C example is fairly similar to the Ada version in terms of code, except the Ada version of course uses more precise types.
Your C version is rather more involved, I would say. I do not object that you can write bullet-proof code in any language, rather I wanted to highlight that it is not as easy to write bullet-proof code in different languages. I think your C version confirms this. Plus it compiles despite a crucial bug: your parameter res should be a Proc** and you should be assigning the result of the allocation to *res.
To answer your question on how a parameter can be out-of-scope, here is a simple example:
Sign in to Reply
Lundin
4/9/2010 2:57 AM EDT
For the Ada version, you had set a limit for the maximum size of memory to be allocated, using a language-specific syntax. So does my C version, but your original C code does not. And oops... yes it should be pointer-to-pointer.
As for returning pointers to local variables, it indicates that they are beginner programmers. I really don't hope anyone professional encounter such code when running static analyzis on their soon-to-be production code.
In fact, if you return any kind pointer from a function in any C program, that is in my opinion a sign of bad design. The need of returning a pointer from a function originates from flawed logic. Instead, return data through pointers in one of the function parameters and leave the allocation of said parameters to the caller.
A pointer returned from a function can point at the following kind of variables:
- Local variables. This case is always a bug in the program.
- The same data as one of the parameters of the function. This is inefficient programming, as the same parameter will be used twice in the function, taking up unnecessary stack space when the function is called.
- Dynamically allocated variables. Returning pointers to variables that were dynamically allocated inside the function is one of the most common causes for memory leaks in C programs. Instead, leave allocation to the caller, as in the example (but use pointer-to-pointer :) ).
- Global/static variables. This is a bug in a multi-process environment, as it makes the function unsafe for multiple processes / ISRs. It also makes the function vulnerable to unspecified behavior, in case it is called like this:
print(func(), func()); /* where func is the function altering global resources */
Returning pointers to static file scope variables is also a flaw in the program design, as it breaks the encapsulation of private variables.
Sign in to Reply
willc2010
4/9/2010 5:04 AM EDT
Hello Lundin and others.
In your revised version of the C code, the types Result_t and uint8_t are compatible with each other in expressions, despite their different purposes. Indeed, they are compatible with every other enum and integer type and floats under most circumstances, under various confusing and inconsistent silent promotion rules. If you are lucky then you will sometimes get a warning, but you can't rely on it. Even the MISRA checker allows a Result_t to be assigned to a uint8_t, even though this almost certainly makes no sense. And in any any C-derived language (MISRA or not) you have to use one of a small fixed set of integer types that almost never have the appropriate range for the quantity in question. And as well as having inappropriate ranges, quantities that should never be assignment compatible or mixable in expressions (without explicit conversions) can be silently confused with each other.
C (unlike Ada) basically doesn't have array and enumeration types at all, just pointers, storage allocators and defines. Here is a real bug (with name changes and simplifications):
--------------Editor's note: those are square brackets around the word "NUM_PROCESSORS" above. The comment system for some reason thinks they're links. --ESD magazine.-------------------
This compiles, passes MISRA checking, and makes no sense. The if test is never true (it should say "ActiveState[n] == INACTIVE)"). There isn't a real type tState, just a bunch of constants. INACTIVE, being the first one, has the value 0. ActiveState, used in an expression, is merely a pointer. Pointers can be compared with 0. This is all fundamentally bad.
Also, you cannot just dismiss returning a pointer to a local variable as a beginner's mistake. It can be done in less obvious ways, for one thing. But the main point is that it is obviously dangerous and should simply be forbidden. Ada has rules that are designed to prevent such a mistake from even compiling. They make it less permissive than C in this respect. This is a good thing.
C programming is a minefield of these sorts of bugs, most of which can not occur or can readily be precluded in Ada, in part because Ada has a proper type system. The amount of effort that is required to debug and gain confidence in a C program is much greater, and you end up relying a lot on conventions that are difficult to enforce, rather than rules that are checked mechanically (as your example shows). And on top of that C syntax is much harder to read. In my experience it is typically a lot more difficult to gain confidence in 10_000 line C projects than in 100_000 line Ada ones. I think that this is the main point of the article.
Edited by: ESD editorial staff: SRambo on Apr 9, 2010 9:52 AM
Sign in to Reply
yannick.moy
4/12/2010 4:00 PM EDT
Hello willc2010,
Your example reminds me of a similar one reported by Coverity's team (they build static analyzers) in a recent article (http://mags.acm.org/communications/201002/?folio=66&CFID=85833885&CFTOKEN=18821371)
Here it is:
The error is that the second part of the test is always true, because it compares the address of a function to zero, instead of the result of calling this function and zero. Just missing ().
Because of the security implications of this bug, they say that finding it got them enormous press coverage. Almost funny when you realize how stupid (= easy to find) this bug is!
Sign in to Reply