Design Article

Virtual multi-cores simplify real-time system design

Henk Muller

7/27/2009 6:02 PM EDT

It is difficult to implement a combination of multiple real-time tasks on a traditional processor. For this reason an FPGA and hardware design techniques have typically been used instead. At the same time, multi core design methods have become familiar to many through the use of multiple microcontrollers and processors in order to construct real-time systems.

In this article we discuss the key properties of emerging multi-core systems that make them an appealing and elegant platform for complex real-time tasks.

Real-time hardware
From the design perspective, the biggest single advantage of using a hardware implementation is that hardware offers composability. In short, composability means that when there are two functional blocks, F and G, that perform their task in times Tf and Tg, then there are simple mathematical rules that allow us to predict the time and resources are required to perform both F and G, either in series or in parallel.

As a first approximation, when F and G are performed in parallel on hardware, then this will take a time max(Tf,Tg). If F and G are performed in sequence in a hardware implementation, then time will be Tf+Tg. Similar simple rules can be devised for the resources required; if the two functional blocks are implemented using resources Rf and Rg, then the parallel composition will requires resources Rf+Rg, and sequential composition will require at most Rf+Rg resources.

It is these simple models that allow us to apply sound reasoning to hardware designs, and compose designs out of basic blocks. Real-time behaviour can to a large extent be predicted in advance, provided that the resources are available. Hardware can be scaled up, for example by using a larger FPGA; we size the hardware to suit the task at hand. A consequence of this is that FPGA manufacturers have to sell devices containing different numbers of cells, so that embedded systems designers can use the smallest FPGA that suits their needs, thereby minimising costs.

Physical constraints and layout tools mean that composability has a limit though. When laying out two parallel tasks, routing and placement may cause either task F or G to slow down, or use more resources than expected.

Real-time software
A single real-time task can easily be written on a software based system that uses a general purpose processor. Even without any dedicated hardware support data can be read or produced in real-time by calibrating the software speed with the required hardware speed. As an example, one of the first cheap home computers, the ZX-80, used this principle to generate TV-output using just a Z80 microprocessor and employing no external hardware.

However, there is not always a simple model for composing two or more real-time software components. Running two real-time tasks in a software-only environment is comparatively complex and usually requires some form of RTOS, and requires the tasks to cooperate with each other. In other words, task F may have to be redefined in the light of requirements of task G; which means that composing two tasks on a single processor is not as easy as it is on an FPGA.

The most difficult aspect is trying to predict the performance of the combined task. If processors have components that work on statistical principles, for example a cache that usually contains the required data, then performance prediction is virtually impossible without special partitioning hardware.

Indeed, if two jobs with large memory footprints are executed on a dual-core processor, both tasks can run significantly slower, simply because they are trashing each other's cache footprint.

Alternatively, it may turn out that both run as if they have the cache to themselves, because neither of them needs the cache. Similarly, the time taken by scheduling algorithms is not completely predictable, although the effect of this is usually not visible unless the program requires lots of small tasks to be completed. It is the lack of predictability that causes the system not to be composable. Together these factors affect the programmability of the system.

A particular problem when running multiple software real-time tasks is to precisely time I/O signals. It is usually possible that two I/O devices will demand the processor's attention at (almost) the same time, and even though the real-time task will know exactly when the first input signal came in, it will not be able to time both signals without additional hardware in the I/O devices.

Compared with the FPGA solution, software solutions scale in a different way. The amount of memory can usually be adjusted based on the needs of the task. The amount of processing power of course is usually preset, in that one buys a 2.4 GHz pentium, or a 20 Mhz PIC.

The trial and error nature of the process of composition means that the designer has to iterate through the design cycle many times. The biggest advantage of software solutions is that the design cycle is very short, but this advantage can easily be negated by the need to iterate many times.


Next:




JamesTR

7/28/2009 6:32 AM EDT

Thats interesting approach.

But, fixing memory size to avoid cache problem is NOT solution to a problem its avoiding "the" problem.

The example code looks very much like Occam and its so Transputer! I thought it was dead for 20 years !

Sign in to Reply



Sundar Srinivasan

7/30/2009 2:47 AM EDT

A very good article. But one point was completely missed out: how to resolve a fetch deadlock problem that might arise between the multiple cores. Real time in real world is a little more difficult because of this.

http://sunnyeves.blogspot.com/

Sign in to Reply



TechnoMarketeer

7/30/2009 5:01 AM EDT

@JameTR - agreed - but you don't say what the problem is - which in hardware terms comes down to contention for resources, whatever those resources are.

@Sundar - is getting to that point in one very specific example.

More generally, as far as I can see this design approach just moves the contention issue (that multiplies up as per Charlie Hoares theories on Communicating Sequential Processes as the numbers of cores scale) to an inter processor comms problem of a different nature - instead of shared memory contention, cache or otherwise, we now have contention occurring at the level of inter-processor cpu comms.

Of course we could assume the tin-opener - for example if we are dealing with a small system with a number of orthogonal applications, or one with multiple discrete functions contained within a single processor, then all would be well - but then that would be true in an RTOS based multicore processing system too.

The main value of this approach as I see it from this article is the clean segregation of what are truly discrete sub-system components, which indeed will add to re-use. A sort of poor mans SoC, or prototyping environment, and I don't mean that in a derogatory way. With SoC costs soaring, if XMOS can create libraries of discrete functions for their processors/threads AND clearly define the comms requirements into and out of that processor, in a manner that facilitates rapid system design validation and integration, then it could be very useful.

But lets not pretend we've solved the multicore problem...

Sign in to Reply



JamesTR

7/30/2009 6:07 AM EDT

I've look more details into xmos.com and esp. their ISA for using "ports" aka. just IO pins. Interesting !!

Two things thats pop up in my mind.

1. All the wiggling of pins and relationships between pins are all so buried inside their proprietary language called XC. Its neat language, but none of my team members are interested to learn new "proprietary programming language". VHDL at least, add new skills to their CV and its used by everyone.

2. The IO pin capabilities so limited for today's use, compare to similar priced FPGAs. I'm not sure how they can be power/cost effective in this approch, where software ISA (fetch,decode,execute) controling the IO pins.

All said, I'm curious and propably buy one of their dev boards to play !!

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)