Design Article
A real-time HPC approach for optimizing Intel multi-core architectures (Part 1 of 3)
Dr, Aljosa Vrancic and Jeff Meisel, National Instruments
6/22/2009 3:27 PM EDT
- Part 1 is a review of real-time concepts that are important for understanding this domain of engineering problems, and a comparison of traditional HPC with real-time HPC.
- Part 2 outlines software architecture approaches for utilizing multi-core processors, along with cache optimizations.
- Part 3 will consider industry examples that employ this particular methodology.
Introduction to Real-Time
Concepts
Because tasks that require acceleration are so computationally
intensive, your typical
HPC problem could not traditionally be solved with a normal desktop
computer,
let alone an embedded system. However, disruptive technologies such as
multi-core
processors enable more and more HPC applications to now be solved with
off-theshelf
hardware.
Where the concept of real-time HPC comes into the picture is with
regard to
number crunching in a deterministic, low-latency environment. Many HPC
applications perform offline simulations thousands and thousands of
times and
then report the results. This is not a real-time operation because
there is no timing
constraint specifying how quickly the results must be returned. The
results just
need to be calculated as fast as possible.
Previously, these applications have been developed using a message
passing protocol
(such as MPI or MPICH) to divide tasks across the different nodes in
the system.
A typical distributed computer scenario looks like the one shown in
Figure 1, with
one head node that acts as a master and distributes processing to the
slave nodes in
the system.

Figure 1: Example configuration in a traditional HPC system
By default, it is not real-time friendly because of latencies associated with networking technologies (like Ethernet). In addition, the synchronization implied by the message passing protocol is not necessarily predictable with granular timing in the millisecond ranges. Note that such a configuration could potentially be made real-time by replacing the communication layer with a real-time hardware and software layer (such as reflective memory), and by adding manual synchronization to prioritize and ensure completion of tasks in a bounded timeframe. Generally speaking though, the standard HPC approach was not designed for real-time systems and presents serious challenges when real-time control is needed.
An Embedded, Real-Time HPC Approach with Multi-Core Processors
The approach outlined in this article is based on a real-time software stack, as described in Table 1, and off-the-shelf multi-core processors.

Figure 1: Example configuration in a traditional HPC system
(Click on image to enlarge)
Real-time applications have algorithms that need to be accelerated but often involve the control of real-world physical systems—so the traditional HPC approach is not applicable. In a real-time scenario, the result of an operation must be returned in a predictable amount of time. The challenge is that until recently, it has been very hard to solve an HPC problem while at the same time closing a loop under 1 millisecond. Furthermore, a more embedded approach may need to be implemented, where physical size and power constraints place limitations on the design of the system. Now consider a multi-core architecture, where today you can find up to 16 processing cores.
From a latency perspective, instead of communicating over Ethernet, with a multi-core architecture that can be found in off-the-hardware there is inter-core communication that is determined by system bus speeds. So return-trip times are much more bounded. Consider a simplified diagram of a quad-core system shown in Figure 2.

Figure 2: Example configuration in a multicore system. Source: Adapted from Tian and Shih, "Software Techniques for Shared-Cache Multi- Core Systems," Intel Software Network.
In addition, multi-core processors can utilize symmetric multiprocessing (SMP) operating systems—a technology found in general purpose operating systems like Microsoft* Windows,* Linux, and Apple Mac OS* for years to automatically loadbalance tasks across available CPU resources. Now real-time operating systems are offering SMP support. This means that a developer can specify timing and prioritize tasks that are applicable across many cores at one time, and the OS handles the thread interactions. This is a tremendous simplification compared with messagepassing and manual synchronization, and it can all be done in real-time.



