Question for anyone here. If we're heading for the ever more serious "dark silicon" challenges that munggnum speaks of, is that apt to be good for the leading semi IP developers (because it will lead to more ways for them to differentiate their product) or bad for them (because it will make the way forward grind to a halt as research starts bumping up against theoretical limits)?
I'm thinking, in particular, with regard to Ceva, the leader in DSP cores, though I'd be interested in anyone's thought as to the likely effects of the so-called "dark silicon" issue on the semi IP business as a whole.
Thanks in advance.
It's interesting; some folks are having trouble seeing the forrest for the trees. UCSD is saying "mobile applications processors should be composed out of hundreds of specialized cores that can suck energy out of relatively small parts of the system but collectively attain 8x energy savings." Then they show how such a system would be constructed. They have reasonable solutions to many of the problems that have been listed above. This is definitely new.
For those who are actually interested in understanding the details, they can read the actual papers off of the authors' website:
I guess dark silicon means a piece of circuitry which is powered off (disconnected from vdd) using a high threshold PMOS transistor to reduce the leakage current.
This exists for 10 years now.
Doing this on operating system level is very complicated - understandably - since operating systems do not have dedicated functions like FFT, Viterbi decoder , CDMA demodulator etc.
Operating systems probably need sorting, searching in lookup tables etc.
THe coding style of an OS would have to vary a lot when using hardware-based functions which have to be powered on a few clock cycles before they can be used:
instead of calling a function, you would have to
load data into a bufer, turn on the needed function, wait until it says 'I am ready' andthen fetch the result.
These functions, coded into hardware, should be
hardware independent like a good C library which is portable between different procesor sand different OSes.
Sounds like a logistic nightmare.
As sols said, this kind of effort existed before, the real question for me is how generalisable is this work? I mean if you constrain yourself to a specific application, all sorts of optimisations are possible, but could the results of these optimisations e.g. application-specific cores, be harnessed for a wider range of applications, or is it a recurring investment we have to make? The key is to strike the right granularity, and produce a fully operational system. As the article states, work on coordinating the MIPS processor operations with the acceleration cores' (as well as the operations of these) has yet to be done. What would be the resulting overheads i.e. in terms of area, speed, power? The devil is indeed in the details.
The innovation that I see is not in the homebrew design tools which carried out high level synthesis (HLS) but the idea of taking small pieces of Android OS middleware and accelerating each of them as small pieces of hardware. The small pieces add up, covering the available dark silicon (or some of what would have been dark).
Most C-to-hardware translation has focused on large pieces of C code (e.g., Impulse C) or compiled binaries (e.g., Binachip) to transform into a significant IP block on a FPGA or ASIC.
Indeed "dark silicon" and "utilization wall" are buzzwords. But the real juice here is that hardware accelerators are getting to the level of specific application workloads--and a group of college students can craft the design tools to generate them.
Sols, clearly you have some good insights, but the devil is in the details. Voltage scaling to save power is losing its effectiveness in current and future technologies because of threshold scaling limitations and insufficient Vdd overdrive. The AutoTIE work you mention, while very good, targets a much easier class of code, and does not report results on power reduction.
I guess I have to explain basic design tradeoff: if performance is improved (or accelerated if you wish), then clock frequency and voltage can be dropped reducing energy/power. So acceleration/conservation are 2 sides of the same coin.
Regarding the novelty - there is already commercial tool at least several years in the market, which automatically generate ISA extensions or accelerators. Many technical papers were published on this topic, e.g.:
Automatic generation of application specific processors
International Conference on Compilers, Architecture and Synthesis for Embedded Systems, Pages: 137 - 147, 2003