That's my understanding of it too, Kris. The configuration is time-multiplexed as you said above. A whole chip configuration context is loaded at very fast rates to allow for this. I guess your question above was about "partial" dynamic reconfiguration, which I am not sure they support.
Good question. There is no foundry relationship between Apple and Intel. The Reuters article from last year was about Intel execs saying they would be glad to get the Apple business if it was available to them. It is not, at this point. And I know many in the industry say that despite those comments, Intel actually would not want to go down that road.
To @sharps_eng: I think Tabula's concept is quite different from what you are describing...I think it works like this (someone correct me if I am wrong, I am just guessing): you get piece of data to chew, you do something with it and send it away...next you get another piece but this time you are supposed to process the data differently so your FPGA re-configures itself to provide that new function and here you go...and then the next piece of data comes etc...you can packet processing done this way quite efficiently but you need to be damn fast in re-configuring your FPGA...Kris
This Intel getting into "foundry" business was been told/advised by a former marketing director of TSMC. Same foresight were given on IDMs providing "foundry/ASIC" services to Design Houses. It had been coming true since 2008, e.g. Apple using SS to make A5, QCOMM using SS to get BB chips, ....etc.
He also predicted trend and actions that those capable companies would do and how landscape would change over time for next 5-10 yrs.
TSMC lost him and he was doing great in investment community.
I still struggle with Tabula's concept; although I think I can see why they are going for the fastest (22nm) process they can get.
Serializing (which is time-multiplexing by another name) requires the hardware to have spare speed capacity (your pixel clock runs faster than your frame rate, for an extreme example), and if the hardware could run at full speed in parallel mode then it will out-run any multiplexed solution. The only exception is if there is a bottleneck, like free memory bus cycles becoming available because a CPU is busy, this allows pipelining, but the sidelined hardware must be running, doing something useful, while it is switched out, and it must preserve its state, otherwise it is simply redundant, inefficiently implemented hardware.
You can't re-use registers for multiple tasks unless you preserve state or switch only at stateless or minimum-state nodes in the execution flow, and that is a complex compiler function which is extremely difficult to map to applications efficiently.
The above holds true whatever level of logic granularity you work at; you can only get a net gain from serialization if there is inefficiency to be exploited, and I am not sure Tabula have shown where that inefficiency lies, exactly.
I will have to read further...