Break Points

Comment


seaEE

2/29/2012 12:14 AM EST

I just read an interesting article today online in ScienceDaily regarding a ...

More...



Bert22306

2/27/2012 3:59 PM EST

I think that most of the time, the reason multicore CPUs are so much faster than ...

More...

Multicore madness

Jack Ganssle

2/13/2012 11:30 AM EST

Here’s one speed limit that’s heavily enforced.

Route 140 here in Finksburg is heavily patrolled – break the speed limit and you’ll likely face a fine.

But some speed limits can’t be exceeded, no matter how much one wishes to. The speed of light comes to mind. As does the speed at which a teenager’s brain matures.

Then there are multiprocessor limits. Amdahl’s Law tells us that the max speedup achievable is:

where f is the percentage of a problem that cannot be parallelized, and n is the number of processors. In a system, where, say, only 50% of the problem can be executed in parallel, even with an infinite number of CPUs you can only halve the execution time by adding processors.

Gustafson's Law suggests that Amdahl is too conservative, and notes that sometimes problems scale faster in the parallel portion than in sequential. Google’s Pagerank algorithm is one example. I suspect that in most embedded systems, though, Gustafson won’t apply.

However, I believe Amdahl and Gustafson are optimistic in many cases, especially when working with symmetric multicore processors. These have two or more identical cores, each with their own L1 cache. They share L2 and a common memory bus. Executing out of L1 they will scream. But that cache is tiny – often only 32KB. Go to L2 – or worse, main memory – and the brake lights come on. Up to dozens of wait states slow processing, and bus contention will occur if more than one CPU needs memory at the same time. This effect is pretty hard to model since it will be both non-deterministic and very problem-specific.

But Sandia National Labs researchers have come up with some interesting data showing that even on traditional parallel problems multicore’s advantages diminish very quickly. Going from two to four cores nets some serious execution-time reduction. Double down, to 8 cores, and there’s no gain. Each additional doubling slows the system down – by a lot. A 64 core solution slows the system by half an order of magnitude over one with just four.

Multicore as being pushed by the major semi vendors in some cases can offer some significant advantages, both in terms of speed and power. But I think the benefits are being oversold. Memory bandwidth is a hugely-limiting factor. Alternatives such as asymmetric multiprocessing are often a better solution, depending, of course, on the nature of the problem being addressed.

A new processor technology from Venray Technology  is an interesting twist on the memory bandwidth problem. Instead of adding DRAM to a CPU, they add CPUs to DRAM. Small (20k transistors) processors are tightly integrated with memory.

A typical arrangement marries 4 of these cores with 64 MB of DRAM. That puts the CPU transistor count at 0.01% of the memory. Venray’s web site is long on marketing-speak and short of tech details, but the idea is compelling. (Editor’s Note: For more on the Venray design, go to “Startup proposes processor on DRAM process.”)

Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at jack@ganssle.com. His website is www.ganssle.com.





Amr.Ayoub

2/13/2012 1:01 PM EST

Despite relative laws, I think every situation has optimum number of multicore processors. The problem is in the time and effort required to find that case?

Sign in to Reply



cdhmanning

2/13/2012 5:33 PM EST

I think it can be misleading using Amdahl's Law in typical embedded systems. Amdahl's Law was really derived for Big Iron systems with Big Iron problems.

Multi-cores can be effective where you can break the problem up into independent (or close) execution units.

The Parallax Propeller, for instance, has 8 CPUs that execute independently and at full speed so long as they keep within their local memory. They are terrible at sharing memory though. That is pretty good for many situations like software peripherals, but is pretty useless for breaking down a single problem into sub-problems.

Sign in to Reply



medv4380

2/13/2012 7:07 PM EST

They went to 8 cores and got nothing, but they went from 2 to 4 and got a bump in speed?

Sounds like they used an AMD bulldozer mixed with bad thread scheduling from an OS that wasn't optimized for it. Windows 7 still needs an update to use it as well as Linux. Only Windows 8 is supposed to support it properly that's not out yet.

Their is certainly a cache issue with multicores because of false sharing. Java doesn't support any viable method for dealing with it along with most Interpreted languages. C and C++ have ways of fixing it that work more reliably but you have to be aware of the problem in order to fix it.

Sign in to Reply



cdhmanning

2/13/2012 9:02 PM EST

I think perhaps you misunderstand. From what I read it looks like the step up from 4 to 8 give no appreciable value.

The reduced payback benefits you get from running problems in parallel come mainly from the relationship between the problem itself and the hardware. A poorly written OS scheduler can make things worse, but cannot magically make the fundamental problems go away.

I don't for a second believe that Windows 8 will provide a solution where Linux does not.

Linux works fine on many multi-core systems and most of the world's supercomputers run Linux.

The 12 core Mac Pros work well with OSX and Linux. I don't know about Windows.

I would not be in a hurry to drink the Windows 8 Kool Aid until Windows 8 has actually shipped and proven itself.

Sign in to Reply



medv4380

2/14/2012 10:49 AM EST

It's not the Windows Kool Aid. Just the AMD press release explaining why Linux and Windows 7 wouldn't utilize all 8 cores.

Sign in to Reply



kalpak

2/14/2012 2:40 AM EST

Though easier to design and manufacture, symmetric multicore is useless beyond a certain limit for most applications.

What will be more useful is Asym cores; smaller cores each specialized to a specific task, e.g. SPI, I2C, MMC, graphics.

Evolutionary biology must be made compulsory for multicore architecture designers.

Sign in to Reply



Steve_B

2/27/2012 2:14 PM EST

Think about this: every time you run "prof" (profile) on a UNIX program, locate the portion(s) of the code that consume the most CPU time, and improve the code to reduce the high points, you change the balance of where in the program time is spent and therefore the algorithm(s) that would most benefit from a specialized processor (or "stunt box" as we used to call them). Asymmetric cores will be most efficient in use of resources iff your application is known and stable. For unknown problems, it's very difficult to figure out what "stunt boxes" to build.

As a prof once said to us: "A general-purpose computer is one that does as few things as possible incredibly badly". Asymmetric processors can find most of their resources wasted, if the problem is not a good match. So use where appropriate, but understand that you aren't building a "general purpose computer" when you do.

Sign in to Reply



v_a_s

2/14/2012 6:16 AM EST

Perfect Title Mr.Jack.

In microcontroller world, i believe, asymmetrical multicore is the way to go.

With the advancement in process nodes (90nm, 65nm, and lesser...) and ultra cheap (less than a dollar) high performance microcontrollers, i see huge opportunity for multicore asymmetrical controllers in the short term.

The cores might need not be the same. This results in huge variety of SoC like microcontroller chips.

I know a few of them that is worth mentioning,

1)LPC4000 from NxP (Cortex-M4 and Cortex-M0)
2)Concerto™ Series from TI (C2000 and Cortex-M3)

Sign in to Reply



rhfish

2/14/2012 2:40 PM EST

Amdahl's Law limits the acceleration of linear programs converted to parallel. We chose to attack the far more difficult problem of analyzing massively parallel Big Data. This is where INTEL multi-core dies.

Jack correctly points out that cache management and the Memory Wall is what kills multi-core. That is what Sandia found, and that is why we used Sandia's MapReduce-mpi for our benchmarks. You can run your own benchmarks on http://www.venraytechnology.com TEST DRIVE.

Small quibble....TOMI Borealis cores are 22k transistors not 20k. Eight of those fit in a 1G DRAM.

Russell Fish, CTO Venray

Sign in to Reply



kalpak

2/14/2012 11:06 PM EST

With the lowering cost of multilayer board, what may also work is many specialized/semi configurable tiny cores in independent devices all connected by a high speed serial bus (like the PCIe).

Sign in to Reply



prabhakar_deosthali

2/15/2012 12:59 AM EST

In a typical control application we could successfully use a 6 core SOC where each core was dedicated to a specific function - Sensor input and signal processing, keypad and display handling, Actuators output control ( analog and digital) and the main control algorithm.
So an asymmetric multi core solution is always preferable .

Sign in to Reply



Sanjib.Acharya

2/15/2012 11:01 AM EST

Good that you have brought this point out as I was interested to know whether it is really beneficial to move to multicore processor. In typical control application if it not time critical as turbine control or so, do you really need so much of processing speed? I mean in control application, where the sensor/actuator responses are slow (e.g. thermocouple, RTD, limit switch etc.), even if the processing speed is faster, won’t the idle time for asymmetric multi cores be more? Please suggest. Could you please point me to where I could read more on this topic?

Sign in to Reply



Bert22306

2/15/2012 3:34 PM EST

I've also read that going beyond 8 cores, in a symmetrical multicore CPU, is the point of diminishing returns. But that was supposed to be a memory bus problem.

Getting beyond computer hardware, humans working together operate as a massively parallel CPU, right? If people can do this, then I suspect it's only a matter of time before your desktop PC will make better use of multicore CPUs?

Sign in to Reply



Rich Krajewski

2/16/2012 12:23 PM EST

Excellent, excellent point about people serving as an example of parallel processing. The additional level of organization, of people working in a group or society, may introduce "emergent properties" similar to those that occur when individual cells start to cooperate in multi-cellular organisms. In other words, the whole can be greater than the sum of the parts. There is still a limit, I imagine, to amount of organization that can occur, and the benefits that can be gained by it. After all, our human brains are only dual, roughly symmetric processors (the two halves of a brain). You can argue that there are actually many more subprocessors involved, however, making up a human brain, and that they act (fairly) well in concert.

Sign in to Reply



cshore

2/16/2012 12:35 PM EST

It does seem that many people agree that four cores in an SMP system is the point of maximum return. As ever, it comes down to how you design and decompose your program to take advantage of more than one processor. A good task decomposition, with efficient and well thought-out IPC makes a huge difference. The HW guys can keep slapping the cores down but (as usual?!) it's up to we SW guys to make best use of them...

Sign in to Reply



rbarraud

2/16/2012 11:31 PM EST

Looks to me more like regurgitation than agreement.

Sign in to Reply



NeznanovicN

2/16/2012 12:50 PM EST

Multicore became all the rage after CPU clock hit the wall. When it became impossible to increase clock anymore, CPU manufacturers started marketing multi core CPUs as the panacea.

I would say replacing faster clock with multicore is as effective as replacing C with C++. Works well on paper (e.g. resumes, marketing material etc.), but not so useful in real life.

Sign in to Reply



Steve_B

2/27/2012 2:23 PM EST

The issue with extracting performance gain from multiprocessing is that the algorithm you're running must change, unlike simply speeding up a clock. If you don't re-program appropriately, or if your program is inherently not parallelizable, you won't get improvement. Most existing programs when modified to run on a multiprocessor are still the old program at heart (e.g., OS's), and the lack of slowdown may be more about poor parallelization as opposed to an inherently serial task.

The other problem with this is that besides being a *different* skill, writing for parallel execution is inherently harder than writing for a serial one.

Sign in to Reply



cnxsoft

2/16/2012 9:02 PM EST

Multicore should work well in systems where you need to run multiple processes simultaneously such as web servers, but with systems running mainly one process (at a time), the benefits of multicore are more difficult to achieve and require software optimization.

Sign in to Reply



tomkawal1

2/17/2012 3:50 AM EST

Thanks for raising the issue. It reminds that having multisensory and multithreaded issues,
the only way is to dedicate processor and memory to the specific tasks ( like neurons based brain works ) keeping just a sort-of central core for coordination-decision taking. Otherwise:
making symmetric multicore brain, what we design:
a schisophrenic system?

Sign in to Reply



one and zero

2/17/2012 3:54 AM EST

Does Amdahl’s law really help to answer the question if multiple cores achieve the speed-up needed?
I don't think so. Is speed up the only thing we care about maybe also not (see also here http://e2e.ti.com/blogs_/b/multicoremix/archive/2011/05/09/does-amdahl-s-law-really-help.aspx).
There are applications or algorithms out there that are embarrassing parallel. But also some that are sequential. Of course that makes a big difference on how you can utilize a multicore architecture.
But for sure there's also a mix between these two scenarios. So you might want to put only 1 core active for the sequential part and wake up the others when you hit the parallel portion of the application.
The other thing is that the clock speed limit is a fact that we can not deny. So the only way of finding an increase in performance is to go parallel. But is putting a large number of cores on a single chip the answer.
Maybe not. The Chip infrastructure needs to be in place to make that effective. A proper memory architecture, high throughput interconnect and HW aiding inter-processor communication will greatly help achieve the speed up and also energy efficiency goals.
I think TI has a good example please have a look at http://www.ti.com/multicore

Sign in to Reply



sharps_eng

2/17/2012 2:41 PM EST

For some common sense about multiple processors from some of the most experienced people in the field, try XMOS.
Their processors/controllers are real-time and parallelized in hardware and software to start with, and can communicate efficiently in small or large sets to handle tasks of any complexity.
For serious real-time embedded control projects (not corporate SoCs) I can't yet see the downside... single-sourced maybe, but what isn't these days?

(I have no connection with XMOS, by the way).

Sign in to Reply



cdhmanning

2/19/2012 4:05 PM EST

XMOS looks a lot like the Inmos Transputer. Not surprising given that the architect is ex Inmos.

The biggest issue with these off-beat parts is that they require special coding and tend to be used in a way that exploits the architecture. The architecture becomes a major deign dependency of the solution. Your future gets fundamentally linked to the future of the silicon vendor. If they go bust then you need to redesign your whole solution.

On the other hand, there are a whole slew of very similar ARM parts from Atmel, ST, NXP and others. While these parts are not footprint compatible, it is easy enough to migrate from one to another. You can even lay out boards to accept parts from different vendors. Sure, the peripherals need different drivers, but quite often they are pretty close and it is even possible to write one set of firmware that runs on different parts from different vendors.

Should one vendor go bust (unlikely) you can still source parts from another vendor and keep production rolling.

Sign in to Reply



Steve_B

2/27/2012 2:32 PM EST

Prof. David Kuck of the University of Illinois used to say in his classes that there are only three known ways to speed up a computation: (1) make operations take less time (e.g., faster clock), (2) do multiple parts of the calculation in parallel, and (3) use an algorithm that takes less work.

He and his students proved many interesting properties about the limits of speedup via parallelization of various computations back in the 1970's - and it is very clear from the theoretical level that some computations cannot benefit linearly from additional parallel processing capability of *any* kind, no matter how clever you are. Many can improve linearly or nearly so, but it takes considerable expertise to coax the best parallel performance from an algorithm sometimes, and some algorithms can't be sped up beyond a certain point anyway. So we shouldn't be too surprised when adding processors doesn't automatically speed something up.

I suspect Prof. Kuck's cash reward for identifying a 4th method (that isn't a linear combination of those 3) is still outstanding, if you can think of one. Good luck.

Sign in to Reply



Bert22306

2/27/2012 3:59 PM EST

I think that most of the time, the reason multicore CPUs are so much faster than old single core CPUs is that they do completely separate tasks simultaneously. For example, virus scans while your're running other applications, instead of that blasted virus scan taking up precious CPU cycles in the single core CPU.

I ran a really simple example just the other day. I wrote an application that took up virtually 100 percent of CPU cycles to run. I tried it in a single core PC and in a quad core PC. The task manager in the single core PC showed almost a constant 100 percent utilization. The task manager in the quad core showed 25 percent utilization.

So it's simple. That app didn't allow me to run anything else in teh single core machine, essentially, but it had little discernable effect on the quad core.

They way most people work on PCs these days, with virus shields running, e-mail running, perhaps that weather bug, and then also working on the machine at the same time, it's probably not super clever multi-threaded programs that make the real difference, is my bet.

Sign in to Reply



seaEE

2/29/2012 12:14 AM EST

I just read an interesting article today online in ScienceDaily regarding a simulation recently done on the Earth's magentic field. Per the article: "Solving the problem required a staggering amount of computer power from one of the world's most advanced supercomputers, at the National Institute for Computational Science at Oak Ridge National Laboratory in Tennessee. The computer, called Kraken, has 112,000 processors working in parallel and consumes as much electricity as a small town. "

So there are a few problems where having a few extra processors around really helps!

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
Featured Job On
Scroll for More Jobs