Hi Max, yes, if you add the bus system (e.g. AMBA) to the SHP-ed design, you reduce system bottlenecks. SHP has a great impact on the system architecture, this is why I added the word “system” in SHP ;-)
Uhhhh... I'm trying to wade through the gorp. I can't figure out whether your bud has managed to reinvent multithreading or reinvent critical-path optimization. It may be both, but mostly it sounds like fine-grain multithreading.
This isn't a bad thing, and it's cool to leverage IP costs in an FPGA implementation, but multithreading has been done before. Intel's chips do either 2-way or 4-way as a means of "hiding" latencies (context switches or I/O waits) to keep the CPU(s) busy. Tera Computing (renamed "Cray" after buying assets) did 512-way. (The jury is still out there...)
Multithreading is one way of squeezing performance from a fixed set of processing resources. It hides latencies, but pushes thread management into hardware. When a system runs out of thread capacity, then what? At some point the multithreading hardware becomes larger than the CPU. Which is the dog? Which is the tail?
@green_is_now These are up to 16 independent MCUs (ARMs). No “shipping memory states external” involved. They are cycle accurate to the original core.
This is the difference to Multithreading. The 16 CPUs work independent of each other. There is also no “thread management in hardware” as @shikantaza speculates.
@Max, I forgot to mention, that you can do this on peripherals and DSPs as well. If you have 8 Ethernet cores, use one SHP-ed instead to reduce area. Or improve the latency of DSPs with SHP. In general you can say, the bigger, the better (Cray;-)). By the way, SHP is attractive for ASICs as well.
@shikantaza Wikipedia clearly separates between multithreading and multiprocessing. SHP plays with multiprocessing and has nothing to do with multithreading or critical path optimization, c-slow retiming etc. SHP improves performance per area (ASICs) or performance per slice (FPGAs) of any design that is instantiated multiple times, which is not a bad thing. Now working on it since a few years, I realized the impact on the system architecture in the MultiCore era (power performance).
These techniques are indeed becoming more prevalent. There can be significant advantages both in cost and performance. XMOS multicore microcontrollers use these techniques with multiple cores which share an execution unit, and a single shared high speed memory. Together with Event Driven Processing, this approach works very well for software peripherals. It can also have real benefits in designing deterministic cores where low latency, real-time responses can be guaranteed by the architecture.
Ali, you are right, deterministic EDP comes seamlessly with SHP.
I'm not sure if performance improves (it actually gets a little bit worse), but the performance per area improves, since the area reduces. Most importantly the power reduces a lot compared to individual instantiations.
From the system perspective you have a lot of positive secondary effects (reduced system architecture, better system performance, reduced power consumption due to better data sharing, EDP, …)
Hope the world is not going down today, because this is just so much fun.