The debate continues over the best way to solve various security-related problems in silicon. Is a multicore processor that handles multiple functions like encryption, Internet Protocol Security tunnels and the Secure Sockets Layer the best solution? Or does a dedicated processor for a single-layer function provide a better alternative? Despite the vigor with which the debate is carried out, there really is no right or wrong answer.
In fact, developers can arrive at a more comprehensive solution by realistically assessing the problems that need to be solved and applying a dual strategy that includes both hardwired logic and programmable hardware.
Hardwired logic solves the invariant problems, such as the translation of the Ethernet stream from the network physical interface into the ASIC's internal logic. And programmable hardware that is potentially surrounded by custom hardware addresses aspects that are highly variable in nature, such as the interpretation of the client hello received during a Secure Sockets Layer/transport-layer security (SSL/TLS) handshake.
IPsec vs. SSL/TLS
Any discussion of potential approaches for solving security problems in silicon must begin with a review of the most well-known security policies that are applied to data flowing over a network-including Internet Protocol Security (IPsec) and SSL, now standardized as TLS. At this point in time, SSL/TLS has some advantages, for these reasons:
- There is a pre-existing constraint on how IPsec is deployed in a network equipment provider's system. The creation of an IPsec "tunnel," or connection, is usually done by an entity that is mostly independent of the system or device that actually handles the processing of IPsec packets (which involves the encryption and decryption).
- There is a lack of standardized or common clients to ensure interoperability of the various IPsec implementations. This would result in a huge burden for the vendor, which would have to properly support several schemes in its silicon implementation.
One of the most compelling aspects of SSL/TLS is the fact that the client-side deployment is already handled. SSL/TLS technology is incorporated into every popular operating system environment because of their use by Web browsers. For this reason, focusing on the "other side of the wire" at the aggregation point or network system's provider endpoint makes deployment of a solution easier.
The problems associated with putting SSL/TLS in the field are only the first of the roadblocks to be overcome before cryptography can be deployed everywhere. Unlike IPsec, which depends heavily on the symmetric crypto (e.g., AES, DES, SHA-1) performance, SSL/TLS is heavily dependent on the asymmetric or public-key cryptography of the RSA operation performed during every connection a client would make to a server system (for example, the Web browser's contacting an e-commerce site such as Amazon.com).
In fact, the computational requirements of performing a single 1,024-bit, RSA operation are equivalent to taking a 1,024-bit number and raising (exponentiation) to a power of another 1,024-bit number. The resulting number is larger than the sum of all the atoms in the universe. Thus, the ability of standard processing hardware to perform these operations with any speed is extremely difficult.
But hardware that removes all crypto (symmetrical and asymmetrical) that creates bottlenecks to system performance provides a fail-safe and comprehensive solution that cannot be duplicated with any combination of other options. Because crypto does not change (in fact, a very unique attribute of crypto is that it is a time-proven quantity), nothing but the passage of time can instill confidence that a particular crypto algorithm is sound. The Advanced Encryption Standard (AES) is an example of one such scheme that had its algorithms tested intensely to assess any vulnerabilities that may exist. Given this unique nature of crypto, an appropriate instantiation of these functions is in nonprogrammable hardwired logic.
Translating the math associated with crypto into hardware is one step; the next is to create a system architecture that enables the feeding of the newly defined crypto subsystem in an efficient manner, in keeping with the context of SSL/TLS and the desire to provide a highly integrated, complete solution.
Given this requirement, the only way to retain the ability to ensure that crypto is not the bottleneck, and that the final silicon solution offers the complete solution to performing SSL/TLS processing, is to make sure the ASIC incorporates both SSL/TLS protocol-processing and TCP/IP network protocol-processing capabilities.
Furthermore, the ASIC has to support this capability in a way that is very network-friendly. At the convergence of all of the requirements is the definition of a silicon solution that integrates native Ethernet networking interfaces (MII/GMII/TBI)-the "lingua franca" of every deployed network, fully autonomous TCP/IP protocol processing and complete SSL/TLS protocol and crypto processing-in a single ASIC.
Finite state machine
Just as we've discussed discrete hardware blocks as a solution for SSL/TLS-related crypto processing, we need to discuss the appropriate implementation for dynamic, evolving standards-based environments like SSLv2, SSLv3, TLSv1: a software-programmable finite state machine (FSM). This approach allows for the flexibility needed to quickly adapt to any enhancements or refinements that could be associated with the actual establishment, use and termination of SSL/TLS connections. To satisfy the performance requirements, the ultimate approach should include the creation of the soft FSM surrounded by highly specialized and optimized hardware. This hardware performs all of the "heavy lifting" of many nonvariant tasks. For example, the operation and "ingredients" associated with key generation during the handshake phase of SSL/TLS is very specific and unchanging.
Likewise, another unique aspect of a desired silicon implementation is the inclusion of hardware that would completely support the entire gamut of processing resources to fully and autonomously perform TCP/IP offload and processing.
Compared with many implementations that claim to perform TCP/IP processing, our "ideal" implementation does not have the luxury of "punting" the hard aspects of the protocol processing to some other entity. Because the requirement calls for an autonomous implementation of TCP/IP protocol processing, our ultimate design includes a full and complete Request for Comments (RFC) standards-compliant, hardware-based protocol engine. In fact, it's better if two complete TCP/IP protocol engines are incorporated into the silicon.
As is the situation with SSL/TLS protocol processing, the actual TCP/IP implementation includes some very specialized hardware. This hardware performs a portion of the heavy lifting in the areas of packet integrity checking, TCP state retrieval and update, TCP timer evaluation and tracking, and general packet movement and buffering. It also includes the soft FSM that actually implements fluidic aspects of TCP/IP protocol processing, such as ACK processing, packet retransmission, round trip calculations and out-of-order packet processing.
When the problem is partitioned across a combination of hardwire hardware logic that is optimized for various nonvariant applications, and software-programmable finite state machines are used, a balanced and complete solution is reached. It is one that provides the performance needed at a price that is affordable. The silicon device is also flexible, allowing for changes as market conditions vary.
Oscar Mitchell (Oscar_Mitchell@britestream.com), founder and chief technical officer at Britestream Networks Inc. (Austin, Texas).