News & Analysis
Comment
Dr DSP
Seems like it's time to start a definition above LUT for these types of ...
Sanjib.Acharya
I do not agree to the statement "FPGAs are not used within standard computers ...
1,000 processors on a Xilinx FPGA
Peter Clarke
1/4/2011 7:55 AM EST
LONDON -- Scientists at the University of Glasgow have created a 1,000-core computer processor based on a Xilinx field programmable gate array.
The researchers created 1,000 mini-circuits within the FPGA chip with each core working on its own instructions. The researchers then used the chip to process an algorithm which is central to MPEG decoding and was able to throughput data at a speed of five gigabytes per second: around 20 times faster than current top-end desktop computers.
Wim Vanderbauwhede of the University of Glasgow worked with colleagues at the University of Massachusetts Lowell on the project. The key to the speed up achieved was to give each core its own dedicated memory.
"FPGAs are not used within standard computers because they are fairly difficult to program, but their processing power is huge while their energy consumption is very small because they are so much quicker – so they are also a greener option," said Vanderbauwhede, in a statement.
Vanderbauwhede, is due to present his research at the International Symposium on Applied Reconfigurable Computing in Belfast, Northern Ireland, in March 2011.
Related links and articles:
www.arc2011.org
Navigate to related information


Frank Eory
1/4/2011 10:29 AM EST
1,000 cores on an FPGA is impressive, but from an application perspective, what they built was a hardware accelerator for MPEG decoding. Rather than compare its performance to software decoding of MPEG on a top-end desktop computer, it would be nice to see how this 1,000 core solution compares with more traditional hardware acclelators for MPEG decoding.
Sign in to Reply
Etmax
1/5/2011 6:29 PM EST
very valid point
Sign in to Reply
eewiz
1/4/2011 12:27 PM EST
@ Frank. Exactly. I thought the comparison with desktop PC was silly. The least thing they should do is to get a traditional MPEG decoder ASIC and compare performance with it.
BTW about the 1000 cores, are they General Purpose CPU cores like ARM/Mips?? To me it sounds more like limited version CPU with dedicated HW only for a few functions.
Sign in to Reply
eryksun
1/4/2011 5:00 PM EST
According to their paper from last year, each RC (reconfigurable cell) in the MORA (multimedia oriented reconfigurable architecture) SIMD processing array is an 8-bit PIM (processor in memory) with 256 bytes of block RAM, two input ports, and two output ports. Each RC has a PE (processing element) that computes 8-bit fixed point arithmetic, as well as logical, shifting, and comparison operations. A controller handles asynchronous handshaking between upstream and downstream RCs.
They created a C++ DSL (domain specific language) to program the RCs at a high level of abstraction. For example, an 18-line implementation written in their C++ DSL expands to 16,803 lines of VHDL.
In their paper from last year they implemented an 8-bit DCT on the Virtex-4 LX200. It used 22 RCs and 3,368 Virtex logic slices and executed in 200 cycles at 100 MHz. They could squeeze 25 copies of this DCT into the LX200, which by my calculations would yield 12.5 MB/s (11.92 MiB/s). I think they've since switched to the Virtex-4 SX (signal processing model) and possibly optimized for the greater number of XtremeDSP slices rather than implementing every PE with logic cells. That said, the reported throughput of 5 GiB/s makes me wonder which algorithm "central to MPEG decoding" they actually implemented.
Sign in to Reply
paul.moody
1/4/2011 4:19 PM EST
Not so silly .... The MPEG algorithm will/should obviously run best on dedicated hardware so comparing to any sort of general purpose CPU whether 1 or 1000 is academic. It is right to compare the performance of the 1000 cpu core to the desktop in this task for comparative analysis of a "CPU intensive" task.
Sign in to Reply
sharps_eng
1/4/2011 5:55 PM EST
The computationally-bound part of MPEG is when searching the area around a macroblock for matches with a pixel group. The better the search, the more matches, and the more efficient the compression - which is the point of the exercise. That is a good example of a benchmark because it is so darned awkward to speed up, nothing short of brute force will do it currently.
Sign in to Reply
iniewski
1/5/2011 9:44 AM EST
As most people above pointed out this comparison between 1,000 cores on FPGA vs. standalone CPU is apple to oranges. I think the title is misleading, not every hardware core is a microprocessor. The title seems to convey a notion that someone implemented 1,000 ARM-like cores on one FPGA. What bothers me here that titles like this hype the story way too much beyond its significance...Kris
P.s I have nothing against FPGA, they are very useful and command increasing market share displacing many ASICs. But they don't need more hype.
Sign in to Reply
Sanjib.Acharya
1/10/2011 1:16 PM EST
I do not agree to the statement "FPGAs are not used within standard computers because they are fairly difficult to program". FPGAs might not be used in personal computers, but they are extensively used in industrial computers are embedded systems. I don't think it is difficult for the hardware engineers.
I am interested to know how much of FPGA resource (logic cells or gates) were consumed by one of these 1000 core and what was the total resource consumption by all of the 1000 cores?
Sign in to Reply
Dr DSP
1/11/2011 8:57 PM EST
Seems like it's time to start a definition above LUT for these types of processors. Maybe 4004 equivaents? Remember those?
Sign in to Reply