As we all know, today's increasingly large and complex digital integrated circuit (IC) and system-on-chip (SoC) designs often contain tens of millions of logic gates, and ensuring that these designs will function as planned requires extensive timing analysis.
We've all heard the stories about wonderful new environments that allow us to work on these hairy designs at the full chip level. In reality, however, the majority of chip projects simply aren't realized this way. Call me a crusty old cynic if you will, but most of the RTL design engineers I talk to still battle with their version of a wade-through-molasses design flow that goes something like the following:
- A mythical beast whom one never actually gets to meet called "the system architect" by some (although "master of the known universe" may be more appropriate) partitions the design into functional blocks, each of which may (one day when it grows up) end up containing several hundred thousand logic gates.
- As an RTL designer you are given a specification for your block (what it needs to do, its interfaces to the rest of the world, and so forth) along with some timing constraints.
- You then retire into your cubical with instructions that you shall not see the light of day nor feel a fresh breeze until the RTL code for your block is completed.
So off you wander with a little tear rolling down your cheek to start furiously coding away. The problem comes several eons later when you actually have some RTL code in your hands. Will this code meet the timing constraints laid down by those who don the undergarments of authority and stride the corridors of power?
Unfortunately, even if you do have any timing analysis tools that work at the RTL level, the results they give are usually so far out as to be essentially meaningless. They might be sort-of useful as a rough cut (a very rough cut that could be 80% or more out of whack), but you wouldn't rely on them to give you anything like enough confidence to hand your block of RTL over to the system integrator and then make a run for the hills.
Generally speaking, the only way to obtain reasonably accurate timing values is to run physically-aware synthesis and then perform in-place optimization (IPO). Now you really have timing values you can sink your teeth into. The problem is that running physically-aware synthesis on, say, a 400K-gate block may take around 7 hours, followed by another 3 hours or so running timing analysis and generating a timing report. So you're talking about an overnight run, after which you might try "tweaking" your architecture to address some of the timing problems, then back to another overnight run ... and so it goes ... this is how we become old and gray!
Introducing RTL Timing Analysis (RTA)
But hold firm my braves, because there's a light at the end of the tunnel. Those cunning little rascals at InTime Software have developed a really cool technology they call RTL Timing Analysis (RTA). We'll talk about how this works in a moment, but the key point is that you can generate a reasonably accurate timing report on a 400k-gate block in around 15 minutes.
What? 15 minutes? Wow! This pretty much revolutionizes the flow, because now you can experiment with lots of different architectural "what-if" scenarios to see which one works the best. And even if when you have one that achieves your original design goals, you might say "what the heck, let's try something else just to see if it's better" (hey, once you've made the code changes, you'll know one way or the other in just 15 minutes).
Of course, your eagle eye will no doubt have spotted where I said "reasonably accurate" two paragraphs ago. What does this mean? Well, let me pose another question: "How long is a piece of string?" (The answer being "twice as long as from the middle to the end!").
When it comes to saying how accurate something is, it's first necessary to qualify exactly what you're comparing that thing to. As was previously discussed, the first breakpoint for (relatively) accurate timing analysis that can be achieved with conventional flows occurs at the gate level following physically-aware synthesis and IPO. In this case, the timing reports generated by RTA typically correlate to post-IPO delays with an error of 20% or less (worst case errors may rise to 30%).
Now although this may sound high, almost any modern synthesis tool can achieve the required timing if its initial seed is within this 20% to 30% range. It's the paths that are off by 80%, 150%, 200% and higher that are never going to fly, and it's these paths that can be quickly and easily detected, identified, and corrected while working at the RTL level of abstraction using RTA
But how does it work?
Hmmm, how did I know you were going to ask me that one? Okay, in order to wrap one's brain around how this works, it's first necessary to understand that there's a related application that takes the logical and physical (LEF and DEF) definition files associated with an ASIC cell library and generates a corresponding design kit database to be used by the RTA utility.
It's important to note that such a design kit is NOT a library of characterized gates, but is instead a database of characterized logical functions (such as counters and XOR trees). The design kit generator captures the behavior of these logical functions, including timing and area estimations.
InTime's RTA engine subsequently accepts the RTL code for the block, the timing constraints associated with this block (in industry-standard SDF format), and the design kit associated with the target cell library. As the RTA engine reads in the RTL it converts it into a netlist of entities called "work functions." Each work function is an abstraction that directly maps onto an equivalent function in the design kit.
Once the RTL has been converted into a netlist of work functions, the RTA engine performs identical logical operations to those that are typically performed at the gate level, including common sub-expression elimination, constant propagation, loop unraveling, the removal of all redundant functional computations, and so forth.
The RTL engine uses the resulting minimal irredundant network of work functions to perform a "virtual placement" of these functions. This placement is then used to generate reasonably accurate area estimates, which are subsequently used to generate reasonably accurate time estimates. In conjunction with the design kit, the RTA engine understands how the various synthesis engines weight various factors and modify their implementation strategies (such as swapping counter realizations) in order to meet the specified timing constraints. All of these factors are taken into account when performing the analysis.
The end result is that you can now perform an "accurate enough" timing analysis on your RTL blocks around 40x faster than is possible with conventional flows. This dramatically improves the quality of life of the RTL design engineer, but "what of the full-chip system integrator?" you cry! Well, InTime has this covered as well, but I'm afraid that will have to be the topic of a future column. For the nonce, let's just say that InTime's RTA is so cunning you could stick a tail on it and call it a weasel! So RTA receives a resounding "Cool Beans" from me. Until next time, have a good one!
Clive (Max) Maxfield is president of Techbites Interactive, a marketing consultancy firm specializing in high-tech. Author of Bebop to the Boolean Boogie (An Unconventional Guide to Electronics) and co-author of EDA: Where Electronics Begins, Max was once referred to as a "semiconductor design expert" by someone famous who wasn't prompted, coerced, or remunerated in any way.