I do not agree to the statement "FPGAs are not used within standard computers because they are fairly difficult to program". FPGAs might not be used in personal computers, but they are extensively used in industrial computers are embedded systems. I don't think it is difficult for the hardware engineers.
I am interested to know how much of FPGA resource (logic cells or gates) were consumed by one of these 1000 core and what was the total resource consumption by all of the 1000 cores?
As most people above pointed out this comparison between 1,000 cores on FPGA vs. standalone CPU is apple to oranges. I think the title is misleading, not every hardware core is a microprocessor. The title seems to convey a notion that someone implemented 1,000 ARM-like cores on one FPGA. What bothers me here that titles like this hype the story way too much beyond its significance...Kris
P.s I have nothing against FPGA, they are very useful and command increasing market share displacing many ASICs. But they don't need more hype.
The computationally-bound part of MPEG is when searching the area around a macroblock for matches with a pixel group. The better the search, the more matches, and the more efficient the compression - which is the point of the exercise. That is a good example of a benchmark because it is so darned awkward to speed up, nothing short of brute force will do it currently.
According to their paper from last year, each RC (reconfigurable cell) in the MORA (multimedia oriented reconfigurable architecture) SIMD processing array is an 8-bit PIM (processor in memory) with 256 bytes of block RAM, two input ports, and two output ports. Each RC has a PE (processing element) that computes 8-bit fixed point arithmetic, as well as logical, shifting, and comparison operations. A controller handles asynchronous handshaking between upstream and downstream RCs.
They created a C++ DSL (domain specific language) to program the RCs at a high level of abstraction. For example, an 18-line implementation written in their C++ DSL expands to 16,803 lines of VHDL.
In their paper from last year they implemented an 8-bit DCT on the Virtex-4 LX200. It used 22 RCs and 3,368 Virtex logic slices and executed in 200 cycles at 100 MHz. They could squeeze 25 copies of this DCT into the LX200, which by my calculations would yield 12.5 MB/s (11.92 MiB/s). I think they've since switched to the Virtex-4 SX (signal processing model) and possibly optimized for the greater number of XtremeDSP slices rather than implementing every PE with logic cells. That said, the reported throughput of 5 GiB/s makes me wonder which algorithm "central to MPEG decoding" they actually implemented.
Not so silly .... The MPEG algorithm will/should obviously run best on dedicated hardware so comparing to any sort of general purpose CPU whether 1 or 1000 is academic. It is right to compare the performance of the 1000 cpu core to the desktop in this task for comparative analysis of a "CPU intensive" task.
@ Frank. Exactly. I thought the comparison with desktop PC was silly. The least thing they should do is to get a traditional MPEG decoder ASIC and compare performance with it.
BTW about the 1000 cores, are they General Purpose CPU cores like ARM/Mips?? To me it sounds more like limited version CPU with dedicated HW only for a few functions.
1,000 cores on an FPGA is impressive, but from an application perspective, what they built was a hardware accelerator for MPEG decoding. Rather than compare its performance to software decoding of MPEG on a top-end desktop computer, it would be nice to see how this 1,000 core solution compares with more traditional hardware acclelators for MPEG decoding.