Design Article
Cryptography for embedded systems - Part 1: Security level categories & hashing
Timothy Stapko
6/7/2010 2:38 PM EDT
12.2 HashingLow Security, High Performance
Ideally, we would not need any security in our applications at all, instead having the luxury to focus all resources on performance. Unfortunately, this is not reality and we have to balance security with performance. Fortunately, the security-performance tradeoff is relatively linearhigher security means some performance loss, but the balance can be adjusted fairly easily by choosing different algorithms, methods, and key sizes.
For the best performance with some level of security, you can't beat the hash algorithms. As far as resources are concerned, they use far less than other types of cryptographic algorithms. Not strictly cryptography in the classic sense, hash algorithms provide integrity without protection from eavesdroppingin many applications this is all the security that is really needed.
How do we actually make hashing work for us in our applications? Probably the most common use of hashing is to provide a guarantee of integrity. The concept behind cryptographically secure hashing is that hash collisions are very difficult to create. Therefore, if you hash some information and compare it to a hash of that information provided to you by a trusted source and the hashes match, you have a fairly high certainty that the information has not changed or been tampered with in transit.
Another mechanism that employs hashing to provide a level of security is called "challenge-response," after the way the mechanism works. In a challenge-response operation, the system or user requesting the information provides a "challenge" to the sender of the requested information. The sender must then provide an appropriate "response" to the requestor, or the information is deemed invalid.
There are many variants of this simple method for verifying the authenticity of information, but for our purposes, the variant we will talk about uses hashing to prove that the requestor and sender (client and server) both know some shared secret only known to those systems. The challenge is typically some cryptographically secure random number that is difficult to predict, and it is sent by the client to the server (or from the server to the client, as is the case with some HTTP applications). Once the server receives the challenge, it calculates a hash of the challenge and the secret, and sends the hash back to the client. The client also performs the same operation on its copy of the secret, so when it receives the hash it will know that the server knows the same secret.
The reason the mechanism works is because the server could not produce the correct hash value unless it knew the secret. The random number for the challenge ensures that every hash sent back to the client is different; otherwise an attacker could simply send that value to the client and there would be no security.
The challenge-response hashing described here does suffer from a man-in-the-middle vulnerability, since an attacker could intercept the message in transit from the server back to the client. This attack requires the attacker to have access to the routing system of the network in order to spoof the address of the server and the client so it can intercept and retransmit the messages. However, for security at this level, it can usually be assumed that breaking the network routing will be sufficiently difficult to provide a decent level of security.
So now we know how hashing can be used to provide different types of security, how do we actually use it in practice? There are several hashing algorithms available, but the most common are MD5 and SHA-1. Unfortunately, the MD5 algorithm has been shown to have some weaknesses, and there is a general feeling of uneasiness about the algorithm. SHA-1 has been the focus of some suspicion as well, but it has not had the same negative press received by MD5. In any case, both algorithms are still heavily used in the absence of anything better. For this reason, we will look at MD5 and SHA-1 for our examples. Though there is some risk that these particular algorithms will become obsolete, the methods described here for their use in cryptographic systems can be applied to any similarly structured hash algorithms.
In practice, many embedded development tool suites provide libraries for MD5, SHA-1, or both hashing algorithms. The algorithms themselves are not too difficult to implement (they actually consist of a number of quite repetitive operations), but the convenience of having a pre-tested implementation to use is quite nice. Hashing algorithms are fairly easy to optimize as well, so it is quite likely that the provided implementations will already be fairly optimal for your target hardware.
Using hash algorithms is quite easy, since they have only three basic operations that are provided in the user API:
1. Initialization, which sets up the state data structure used to actually perform the hash.
2. Hashing, which operates on the incoming data, typically in raw bytes (may also be text).
3. Finalization, which finishes up the hash, and copies the result into an output buffer.
In this chapter, we focus primarily on C-based development tools, so the following examples of hashing in practice directly apply to the C language, but the principles can be applied to other similar languages (such as Java or C++).
The basic operation of hashing algorithms uses a buffer, in C an array of char type, as a workspace for the hashing. The algorithms utilize a structure that provides state information across hashing operations, allowing for the hash to be added to in multiple operations, rather than all at once in a single hashing operation. This feature is the reason that each of the hashing APIs has three distinct operations.
The algorithms are set up so that they can utilize the state structure to keep the in-progress hash intact without requiring separate support from the application designer. With this setup, once the user application is ready, it can provide an output buffer for the finalization operation. This is important for networked environments where several hashes may be in progress simultaneouslythe user need only keep track of the hash state structures.
So what does hashing look like in a real application? In the following C program, the user input is hashed into an output buffer (actual SHA-1 API may vary):
#include < sha1.h >
#include < stdio.h >
main () {
char input_buf[128], output_buf[20];
struct SHA1_state sha_state;
int i, input_len;
// Initialize the state structure, requires only a reference to the struct
SHA1_init(&sha_state);
// Get user input, make sure buffer is cleared first
memset(input_buf, 0, sizeof(input_buf));
scanf("%127s", input_buf);
// Hash the input, with a length equal to the size of the user input. Note that
// the input to the SHA1_hash function can be of any length
// !!! Danger, strlen can overflow, so we terminate the buffer for safety
input_buf[127] = 0;
input_len = strlen(input_buf);
SHA1_hash(&sha_state, input_buf, input_len);
// Finally, finalize the hash and copy it into a buffer and display
SHA1_finish( & sha_state, output_buf);
for(i = 0; i < 20; ++i) {
printf("%X", output_buf[i]);
}
printf("\n");
} // End program
Listing 12.1: Hashing with SHA-1
That's ithashing is a very simple operation in code. Notice that we do some defensive programming when using the strlen function. Unfortunately, the C programming language does not have very good standard library support for protecting against buffer overfl ow.
In our little program example, if the user entered enough data to fill up the buffer to the end (more than 127 characters), we are relying on scanf to be sure that the last element of the array contains a null-terminator. In our program, the scanf "%s" type is used in the format string with the optional width format parameter, so it should not cause any issues for the call to strlen later. However, if someone was to change the scanf to some other form of input, then the assumption may be violated. For this reason, we add an unconditional null-terminator to the end of the array to be sure that strlen will terminate appropriately.
The hashing example above illustrates the use of a cryptographic algorithm to protect data, but it also highlights the fact that anything in the program can become a security issue. The use of standard C library functions such as strlen can lead to unexpected and unintentional behavior. Sometimes this behavior goes unnoticed; sometimes it leads to a crash. All it takes is one malicious attacker to find the flaw in your program and exploit it somehow. It may not be that the attacker gains access to the whole bank, but shutting down a few hundred automated teller machines could do a lot of financial damage. All that the attacker needs is for you, the developer, to stop paying attention.
The example has a trick or two that help to keep the program safe, such as terminating the buffer (a simple operation that could be easily overlooked), but what if the algorithms themselves were the problem. In the next section we will look at some recent developments with two major hash algorithms (MD5 and SHA-1) that cast doubt on their continued utility.

