Design Con 2015
Breaking News
Adapteva's $100 Parallella Supercomputer Platform Now Shipping
7/23/2013

Image 1 of 3      Next >

The credit-card-sized Parallella supercomputer is based on the combination of a Zynq All Programmable SoC from Xilinx and an Epiphany multi-core processor from Adapteva.
The credit-card-sized Parallella supercomputer is based on the combination of a Zynq All Programmable SoC from Xilinx and an Epiphany multi-core processor from Adapteva.

Image 1 of 3      Next >

Return to Article

View Comments: Threaded | Newest First | Oldest First
adapteva
User Rank
Manager
Thanks!
adapteva   7/23/2013 2:58:14 PM
NO RATINGS
Hi Max,

Thanks for writing about us again! It's been a looong journey since 2010. Patience has never been a strength of mine either, not sure if it should be?:-) Although this milestone was important, the next one is the one that really matters: shipping 6,300 final product boards. As you know when it comes to HW, talk is cheap. What matters is shipping a great product that works reliably. We are getting close and are working as hard as we possibly can to make it happen as soon as possible.

What are you going to do with your board?

Best wishes,

Andreas

 

Caleb Kraft
User Rank
Blogger
Re: Thanks!
Caleb Kraft   7/23/2013 3:04:52 PM
NO RATINGS
I'm curious if the bitcoin miners will put this to work. It seems like the paralell nature of it would be suitible, but I'm no expert.

adapteva
User Rank
Manager
Re: Thanks!
adapteva   7/23/2013 3:14:29 PM
NO RATINGS
The bitcoin algorithm doesn't map well onto the Epiphany archtiecture and there are already a lot of a few really fast bitcoin mining ASICs out there. However, Parallella should do well on the next coin mining craze called "litecoin".  @solardiz at Openwall (the good guys behind John the Ripper password cracker) are looking into it.

 

Sanjib.A
User Rank
CEO
Re: Thanks!
Sanjib.A   7/23/2013 11:40:29 PM
NO RATINGS
Congratulations Andreas and team!...A super computer running flat out and just consuming 5watts!! That is amazing!! Will it be available for sale for individuals outside US (I am from India) in that attractive price? :) I would be very much interested then.

Being a low power, low cost super computer (but little in size :)), I would personally try to explore applications starting with home automation: probably several homes in the neighborhood (why not starting a service :))? Probably allowing the users to monitor and control staffs at their home using their smart phones? That's my little tought as of now...

But would it be underestimating its capabilities? May be it can do more...like running complex algorithms where parallel computing is an advantage....e.g. complex flow algorithms in the O&G indistry?

btw what OS it can run?

 

mcgrathdylan
User Rank
Blogger
Re: Thanks!
mcgrathdylan   7/23/2013 9:12:27 PM
NO RATINGS
Andreas- congratulations on this milestone and here's to continued success. Looking forward to learning how long it takes to ship the 6,300. From the enthusiasm this has generated, I'd venture to guess not that long. Please keep us posted.

adapteva
User Rank
Manager
Re: Thanks!
adapteva   7/24/2013 8:18:49 AM
NO RATINGS
Dylan- Thanks! Shipping 6,300 boards is the one that really matters to us. The expectations of 5,000 KS backers has been a big weight to carry for 9 months now. Fortunately for us they have been incredibly patient and understanding. We'll definitely keep you posted. If you don't hear from us something i wrong:-)

rick merritt
User Rank
Author
Re: Thanks!
rick merritt   7/24/2013 9:06:12 AM
NO RATINGS
Congrats, Andreas. It will be interesting to see what people do with these boards--perhpas will will get a mini-Top 500.

adapteva
User Rank
Manager
Re: Thanks!
adapteva   7/24/2013 9:22:25 AM
NO RATINGS
Rick- Thanks! Once we put together a cluster of 1,000 of these Parallella boards we will definitely be getting up into real "supercomputer" territory, although the definition is a moving target.

Max The Magnificent
User Rank
Blogger
Re: Thanks!
Max The Magnificent   7/24/2013 1:32:19 PM
NO RATINGS
@Adapteve: Once we put together a cluster of 1,000 of these Parallella boards...

Once you do, I want to see pictures!!!

adapteva
User Rank
Manager
Re: Thanks!
adapteva   7/24/2013 1:43:27 PM
NO RATINGS
Definitely! We might even post a construction video.

wilber_xbox
User Rank
Manager
Re: Thanks!
wilber_xbox   7/24/2013 12:43:15 AM
NO RATINGS
Congrats Andreas for the product! The people like you are true inspirations as you believe in something and go for it. Kudos..

DrFPGA
User Rank
Blogger
How About A Chess Playing Super Computer?
DrFPGA   7/25/2013 12:35:27 AM
NO RATINGS
I'd like to see stacks of these implement a distributed chess playing algorithm to create the best chess playing computer in the world. It would crush every existing chess computer out ther. After we do that we would move on to predicting the weather...

Anyone interested in helping to make it happen?

Peter Clarke
User Rank
Blogger
Re: How About A Chess Playing Super Computer?
Peter Clarke   7/25/2013 6:52:44 AM
NO RATINGS
Isnt it as much about the solution algorithm as the computational resources?

My understanding is that successful chess playing machines use a mix of deep move calculation and mapping more abstract ideas to successful chess playing.

 

 

DrFPGA
User Rank
Blogger
Re: How About A Chess Playing Super Computer?
DrFPGA   7/25/2013 12:13:44 PM
NO RATINGS
Peter-

Chess algorithms are usually tuned to the targeted hardware and use various 'tricks' (like using 64-bit board representations that can be managed and operated on easily by 64-bit procesors). Having ranks of processors could allow new algorithms to emerge that might identify ways to use very massive processor banks for a variety of algorithms: physical systems that are currently difficult to model (EM fields, turbulent fluid flow, large molecule interactions, encryption/decryption, etc.)

rich.pell
User Rank
Blogger
Re: How About A Chess Playing Super Computer?
rich.pell   7/25/2013 7:28:44 AM
NO RATINGS
Are there any practical advantages to developing a more advanced chess playing algorithm/computer? 

DrFPGA
User Rank
Blogger
Re: How About A Chess Playing Super Computer?
DrFPGA   7/25/2013 12:07:51 PM
NO RATINGS
On practical advantage is a better understanding of how to break algorithms into pieces that can be efficiently executed on a large number of processors. Chess computers uses algorithms that are common to other difficult problems (economics, scheduling, FPGA routing, etc) so this could help identify some new algorithmic approaches to solving other problems.

Tom Murphy
User Rank
Blogger
Cheap Firepower
Tom Murphy   7/23/2013 3:54:42 PM
NO RATINGS
Max: I'm impressed and amazed by this! Fastastic stuff, and well told.

So, you're a VC now....(glad you disclosed that, by the way)

I have some many silly questions:

-- Is there a catch?

-- could this "supercomputer" empower people to do evil things in new ways? (I guess any computer can. Still, the word supercomputer conjures up images of people making weapons of mass destruction.)

--and you asked the best one: what would people use this for?

betajet
User Rank
CEO
What would people use this for?
betajet   7/23/2013 4:44:51 PM
NO RATINGS
1.  Cheapest Zynq platform available -- Dual-core ARM Cortex-A9 plus FPGA fabric.  Even if you haven't had your Epiphany yet, Parallella lets you play with Zynq for US$99 instead of a US$395 ZedBoard.

2.  Open-source hardware GPU.  The open source software community has been locked out of the massive parallelism allowed by GPUs because most of their architectures are closed and you can't write your own code for them.  This is one of the most frequent compaints about Raspberry Pi.  With Parallella you'll be able to run GPU functions on Epiphany and use the FPGA to display the results (at least I think the logic paths are there to do this).

3.  Parallel programing research has been held back for decades because there are so few parallel computers available to play with. Parallella is a game-changer here, and will allow parallel languages and compilation techniques to thrive.  Look what happened when micro-computers let experimenters have their own computers for hundreds instead of tens of thousands of dollars.

daleste
User Rank
CEO
Re: What would people use this for?
daleste   7/23/2013 10:35:08 PM
NO RATINGS
This looks like a really great little computer.  I too would like to know what uses will be found for it.  I always wanted my own super computer, so I just might jump on the band wagon.

rich.pell
User Rank
Blogger
Re: Cheap Firepower
rich.pell   7/25/2013 7:19:48 AM
NO RATINGS
"What would people use this for?"

While it's certainly interesting to speculate on the potential applications, I'm reminded of similar questions about the neverending increase in disk storage capacity.  It seems like a case of "build it and they [applications] will come."

Tom Murphy
User Rank
Blogger
Re: Cheap Firepower
Tom Murphy   7/25/2013 3:34:08 PM
NO RATINGS
I suppose you're right, Rich. And at this price, what's not to like?  I wonder if there are plans to consumerize this...   Seems like it would be a catchy sales pitch: Why settle for an ordinary computer when you could have a supercomputer for less?

MajorTom
User Rank
Rookie
Re: Cheap Firepower
MajorTom   7/27/2013 3:09:02 PM
NO RATINGS
"What would people use this for?"


Hmmm, how about breaking 256-bit AES?


The NSA is planning on using supercomputers, but these would require less power.


selinz
User Rank
CEO
Re: Cheap Firepower
selinz   7/27/2013 5:58:28 PM
NO RATINGS
Here's an idea. Write an x86 interpreter and run Windows!

luting
User Rank
CEO
Qualcomm Snapdragon 800 GFLOPS
luting   7/24/2013 12:07:27 AM
NO RATINGS
Anyone know what is SnapDragon 800 GFLOPS?

eewiz
User Rank
CEO
Awesome
eewiz   7/24/2013 2:12:01 AM
NO RATINGS
Awesome stuff. Congrats Andreas. 

Andreas's linkedin page says Adapteva "Reached profitability with less than $2M investment" . But products are still on preorder? Am I missing something?

eewiz
User Rank
CEO
Re: Awesome
eewiz   7/24/2013 2:15:36 AM
NO RATINGS
oh ok.. i see that Adapteva cores are already shipping in other products

http://www.adapteva.com/products/system-products/

adapteva
User Rank
Manager
Re: Awesome
adapteva   7/24/2013 7:30:56 AM
NO RATINGS
Thanks! We did reach profitability briefly but were not able to sustain the momentum. As you know, profitability is not a stable state:-)

lorincz_#1
User Rank
Rookie
Backend
lorincz_#1   7/24/2013 11:24:53 AM
NO RATINGS
Does anyone know why they didn't include some type of high speed message passing bus?

The supercomputing capability of such a scalable setup seems limited to the latency/throughput of the ethernet bus. 

adapteva
User Rank
Manager
Re: Backend
adapteva   7/24/2013 11:30:57 AM
@lorincz_#1

The Parallella actually has a ~10Gb/s memory mapped low-latency link (through the "PEC" connector) that can be used to to construct some interesting large scale topologies. See the specs here.http://www.parallella.org/board

 

mlloyd
User Rank
Rookie
eLink
mlloyd   7/24/2013 12:30:07 PM
NO RATINGS
Congratulations, Andreas, on shipping your product and nearing your large milestones.  I am curious about your eLink interface between the Zynq and Epiphany.  Since you designed the Epiphany from scratch, is it able to make use of the high-speed transceivers offered by the Zynq?  We commonly interface processors to FPGAs in my business, and it is always a challenge to find processors that have high-speed buses that we can use to communicate with our FPGAs.

adapteva
User Rank
Manager
Re: eLink
adapteva   7/24/2013 1:41:31 PM
mlloyd - A few years back we decided to stay away from the high speed SERDES and use a source synchronous LVDS interface instead. This way we could attach to low cost as well as high end FPGAs.  The interface does use a fair amount of pins (8 data lanes) but can provide up to 16Gb/s total bandwidth with a 500MHz clock. This turned out to be a good choice because the low cost zynq 7010 and 7020 don't currently support high speed serdes.

mlloyd
User Rank
Rookie
Re: eLink
mlloyd   7/24/2013 2:28:17 PM
NO RATINGS
Source synchronous LVDS -- that would be great for a robust, versitile interface.  8 data lanes is not too many pins compared to the much lower bandwidth parallel interfaces we have used to communicate with processors currently available on the market.  Thanks for the information!  I hope to have the opportunity to use the Parallella.

zhgreader
User Rank
Rookie
Re: eLink
zhgreader   7/24/2013 10:35:31 PM
NO RATINGS
this may change the current desktop concept. farmwork.

p_g
User Rank
Rookie
amazing...
p_g   7/25/2013 4:51:30 AM
NO RATINGS
50Gflops/watt ?? waiting to see how this core making into mobile devices giving them amazing compute power for gaming....

KB3001
User Rank
CEO
Peak performance and real cost
KB3001   7/31/2013 11:52:41 AM
NO RATINGS
Hi Guys.

I wonder how was the $100 cost figure obtained? In a real world set up, development time, maintenance, and support costs would all be added to get the market cost. Is that the case with the $100 figure?

 

As for performance, the figure given is peak performance, a real benchmark performance could/would be a tiny/small fraction of that. Have any benchmarks been conducted on this platform?

 

Finally, what is the killer app that would make it likely to be continuously developed for state-of-the-art fabrication nodes?

 

Appreciate your thoughts on the above!

adapteva
User Rank
Manager
Re: Peak performance and real cost
adapteva   7/31/2013 12:08:09 PM
NO RATINGS
@KB3001 The entry per board price is $99. We don't disclose our actual costs.

Here are a couple of the benchmarks that we have run:

http://www.adapteva.com/white-papers/benchmarking-the-raspberry-pi-vs-the-parallella/

http://www.adapteva.com/white-papers/more-evidence-that-the-epiphany-multicore-processor-is-a-proper-cpu/

In my opinion the "killer app" for the Parallella platform is ..."computing".

KB3001
User Rank
CEO
Re: Peak performance and real cost
KB3001   7/31/2013 2:18:08 PM
NO RATINGS
Thanks @adaptiva. I will look at these in detail later, but have you calculated the performance per dollar and performance per watt for such benchmarks and compared with competing platforms?

jb0070
User Rank
Rookie
Re: Peak performance and real cost
jb0070   9/5/2013 6:54:58 PM
NO RATINGS
Typical microprocessors are built to allow high elvels of branching in the instruction set, including "multi-processing-in-time". Parallel processing arrays are best used as data-processors in a data-flow arrangement - processing data as it arrives, in real time.

Peak performance issues come from 1): branching the processing code; 2): I/O bottlenecks. For a "data-flow" machine, the code does (or should) NOT branch: each processor performs the same calculations "ad eternity". (The results of any given processor might be ignored, ie: scaled to zero, but the calculations are constantly done). If the parallel-processing-machine is I/O limited, then the processors will starve, just like any other processor (ie: bad design).

In other words, the peak performance mentioned would NOT be a "tiny/small fraction of that", or the system either is 1): overkill; 2): not implemented well; or 3): not suited for a parallel process. For data-rate processing systems, a well engineered parallel array would be humming along, at full speed and at, or near the peak processing rate possible. [One would likely NOT run a Ferrari on a school bus route, or mail route (with lots of stops and turns); that Ferrari would be best suited to screaming along the Autobahn, pedal "to the metal".]

"Killer app?: Consider vision/speech/radar/sonar/neural ... etc ... systems where large amounts of data is/are contantly arriving, being processed, and sent along the processing pathways.

=== OR ===

How a real-time gaming system tied to a live (American) football broadcast/Madden-game (coaches view), linked to your kinnect, where you get the QB's (or running back's) view, and have have to "make the play"? And at NFL or college level, real-time speeds. [Use the kinnect for player movement, and even judging the throw itself]. Or, if not for the masses, how about making that system for the players? Baseball from the batter's view might work well, too! There is lots of tape of great pitchers - Sandy Koufax --- Randy Johnson, etc ... could you hit their stuff?

Great product! Just need to "find" the money to buy one, if not the next 6000 (or so!).

KB3001
User Rank
CEO
Re: Peak performance and real cost
KB3001   9/6/2013 4:41:14 AM
NO RATINGS
@jb0070, if only things were that simple....
 


jb0070
User Rank
Rookie
Re: Peak performance and real cost
jb0070   9/6/2013 5:29:09 AM
NO RATINGS
"if only things were that simple" ... what things?

A data-flow system is that simple.

If you want to run large programs that are not data-flow, then you have other issues, and an array of coupled processors is not going to be an appropriate solution for that problem set. For the "typical" complex non-data-flow program scheme, one would need to do the typical threaded software, which is NOT simple. On that I think we all agree. Code that is highly branched, and/or that require intra-process communications, have synchronization issues, and that leads to processor stalls ... unless one manages to scale the various threads and execution paths to all cycle together. Not a task for the feint-hearted. Code that does a lot of task-switching is not good for array'd processors, either.

But physics and engineering matrix decomposition (and etc) type programs are good array'ed processor problems. Not every tool is appropriate to every problem. Array'd processor sets are good at "high" data-rate, repetitive processing.

For task-switching code, a processor would be better suited to have larger register files, and caches. For huge, unwieldy, non-threaded code to parallel ... well, there is little hope for that, save maybe for massive re-writing? In the engineering world that would probably include both Verilog/VHDL themselves, as well as the code that they generate, for instance.

The fun of engineering, is to know which tool to use for what problems ... and/or making a new tool to do something that otherwise was not tractable. This board is decently fast, cheap, and capable for data-processing problems. So, for that application set, things are pretty simple, given this new solution.

For what it is worth, I designed a similar arrayed processor chip family, and studied (in some detail) what applications to which it provided a good solution space, and where it was not going to be useful. The biggest hurdle, is getting the software cycle synch'd across all of the processors, and that issue was addressed through a software simulator. This required an assembly code level of attention to detail, but once coded, each "program element" could be stitched in from higher levels of simulation, such as the Ptolomy program out of Berkeley. Of the various types of data-flow programs, FFT's presented the worst issues. It was NOT something anyone would ever want to use to run (say) Linux. Nor would it be worthwhile for small SPICE simulations. But, work through a large SPICE sim, or other large data-set data-flow program and it was faster and lower power than uP's.

 

KB3001
User Rank
CEO
Re: Peak performance and real cost
KB3001   9/6/2013 5:39:51 AM
NO RATINGS
@jb0070, I asked about the killer app and you enumerated a list: radar, image, audio etc.  which many other technologies are going for. To make it commercially long term, you need to sort out issues to do with programmer productivity, maintainability, and cost, and I can't see why this particular technology will succeed where others have flopped.  

Most Recent Comments
Radio
NEXT UPCOMING BROADCAST
EE Times Senior Technical Editor Martin Rowe will interview EMC engineer Kenneth Wyatt.
Top Comments of the Week
Like Us on Facebook

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
EE Times on Twitter
EE Times Twitter Feed
Flash Poll