BERKELEY, Calif. Is multithreading better than multi-core? Is multi-core better than multithreading? The fact is that the best vehicle for a given application might have one, the other or both. Or neither. They are independent (but complementary) design decisions. As multithreaded processors and multi-core chips become the norm, architects and designers of digital systems need to understand their respective attributes, advantages and disadvantages.
Both multithreading and multi-core approaches exploit the concurrency in a computational workload. The cost, in silicon, energy and complexity, of making a CPU run a single instruction stream ever faster goes up nonlinearly and eventually hits a wall imposed by the physical limitations of circuit technology. That wall keeps moving out a little farther every year, but cost- and power-sensitive designs are constrained to follow the bleeding edge from a safe distance. Fortunately, virtually all computer applications have some degree of concurrency: At least some of the time, two or more independent tasks need to be performed simultaneously. Taking advantage of concurrency to improve computing performance and efficiency isn't always trivial, but it's certainly easier than violating the laws of physics.
Multi-processor, or multi-core, systems exploit concurrency to spread work around a system. As many software tasks can run at the same time as there are processors in the system. This tractability can be used to improve absolute performance, cost or power/performance. Clearly, once one has built the fastest single processor possible in a given technology, the only way to get even more computer power is to use more than one of these processors. More subtly, if a load that would saturate a 1GHz processor could be spread evenly across 4 processors, those processors could be run at roughly 250MHz each. If each 250MHz processor is less than a quarter the size of the 1GHz processor, or consumes less than a quarter the power (either of which may be the case because of the nonlinear cost of higher operating frequencies), the multi-core system might be more economical.
Many designers of embedded SoCs are already exploiting concurrency with multiple cores. As is not the case with general-purpose workstations and servers, whose workload is variable and unknowable to system designers, fixed sets of embedded device functions are often able to be analyzed and decomposed into specialized tasks. Consequently, it is also possible to assign tasks across multiple processors, each of which has a specific responsibility, and each of which can be specified and configured optimally for that specific job.
Multithreaded processors also exploit the concurrency of multiple tasks, but in a different way and for a different reason. Instead of a system-level technique to spread CPU load, multithreading is processor-level optimization to improve area and energy efficiency. Multithreaded architecture is driven to a large degree by the realization that single-threaded, high-performance processors spend a surprising amount of time doing nothing. When the results of a memory access are required for a program to advance, and that access must reference RAM whose cycle time is tens of times slower than that of the processor, a single-threaded processor can do nothing but stall until the data is returned.
Multithreading can be described thus: If latencies prevent a single task from keeping a processor pipeline busy, a single pipeline should be able to complete more than one concurrent task in less time than it would take to run the tasks serially. This means running more than one task's instruction stream, or thread, at a time, which in turn means that the processor has to have more than one program counter and more than one set of programmable registers. Replicating those resources is far less costly than replicating an entire processor. In the MIPS32 34K processor, which implements the MIPS MT multithreading architecture, an increase in area of 14% can buy an increase of throughput of 60% relative to a comparable single-threaded core (as measured using the EEMBC PKFLOW and OSPF benchmarks, run sequentially on a MIPS32 24KE core versus concurrently on a dual-threaded MIPS32 34K core).
In theory, multi-processor architectures are infinitely scalable. No matter how many processors are used, it is always easy to imagine adding another, although only a limited class of problems can make practical use of thousands of CPUs. Each additional processor core on an SoC adds to the area of the chip at least as much as it adds to the performance.
Multithreading a single processor can only improve performance up to the level where the execution units are saturated. However, up to that limit, it can provide a "superlinear" payback for the investment in die size.
Although the means and the motives are different, multi-core systems and multithreaded cores have the common requirement that concurrency in the workload be expressed explicitly by software. If the system has already been coded in terms of multiple tasks running on a multitasking OS, there may be no more work to be done. Monolithic, single-threaded applications need to be reworked and decomposed either into sub-programs or explicit software threads.
This work must be done for both multithreaded and multi-core systems, and once the work is completed, either system can exploit the exposed concurrency, another reason why the two techniques are often confused, but which makes them highly complementary.