In our earlier discussions, I outlined the steps required in the adaptation-tuning-finishing-hardening process. Now we have to consider what this means for the design task. You want to optimize to a target application, but you need to support multiple targets, so this requires deferring some configuration details as late as possible -- memory and FIFO sizing are two obvious examples.
Changing/adding feed-throughs and late-stage pipelining are examples even later in the flow. This means you want to preserve configurability in the ARM sub-system while instrumenting most of the transformations, and you want to build the adaptation wrapper around the subsystem. Our ARM subsystem is already configurable through Verilog "ifdefs" and "for-generates" and the like, but automating transformations on top of this kind of RTL is not easy. Consider the possibilities as follows:
- Code up all your changes by hand using yet more "ifdefs" and "for-generates"? Remember, this is embedded in the ARM RTL. It's difficult to do by hand, a pain to re-create on new RTL drops, and massively difficult to verify all expected configuration choices. Most groups quickly figure out that this is not a good path. Scripting is even worse -- how do you script instantiation of a technology memory you don't yet know, with MBIST controllers and signal hookup, inside that loop?
- Use a mainstream EDA tool to perform the transformations? This is only really possible if you can accept gate-level at the output (not so great for verification). Plus, every mainstream EDA tool, when analyzing a design, first resolves away all of the configurability, leaving just one selected configuration to complete analysis. This is not an overlooked limitation -- it is fundamental to the way in which they work.
- Use hierarchy transformations to restructure for power management and layout optimization? These are frankly far beyond the capabilities of most scripts unless highly constrained -- don't even think of trying to make it work with legacy or third-party IP or System Verilog...
- Script the assembly of the adaptation wrapper? This is workable if similarly constrained, but throw in configurability and hierarchy and third-party and legacy IP and SystemVerilog for design, and things start to get really ugly.
All of which brings us to the last step. GenSys usage has been increasing because it can handle transformations under these constraints. It can preserve configurability in the core subsystem while inserting transformations, it can configurably add adaptation and finishing logic, and it can restructure the RTL hierarchy and add/change feed-throughs. I'm not claiming this is a miracle cure -- this is a hard problem, so there are capability holes; for example, it can't preserve an arbitrary number of configurations and it can't yet surgically replace behavioral logic in critical paths.
There is always room for improvement, but it is already close enough to be adopted into multiple production flows. I know some teams are still going to try for an in-house solution, and I wish them luck, but I know how long it took us to get this close. Making it work on RTL that you control is already very hard. Making it work across IPs that you don't control is eye-crossingly hard.
So, at the end of the day, the reason CPU and GPU subsystems are not just "bigger IP" is that they have to be wrapped and they have to be optimized to the nth degree to be competitive on PPAR. This means they have to be adapted, tuned, finished, and hardened, which means the subsystem team has to be centralized to service multiple products.
But each instantiation of the subsystem is a little different and they all need to be turned out quickly, which means the transformation flow has to be optimized. In turn, this means you need automation that can preserve underlying configurability and also can assemble configurably. Phew! It's quite a bit more complicated than dealing with your average IP.