United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

Design Automation

Property Verification Using Theorem Proving and Model Checking

Designers try out two tools on an experimental high-speed 16 x 16 ATM switch fabric containing about 5 million transistors.

by Ying Xu, Eduard Cerny, Allan Silburt, and Roger B. Hughes


We investigated the Lambda formal synthesis and theorem-proving system and the CheckOff-M symbolic model checker, both from Abstract Hardware Inc., to see how easy or difficult they were to use to verify properties of a design. For our investigation, we chose a 16 x 16 ATM switch fabric comprising some 5 million transistors and sought to answer these questions: What has to be sacrificed in modeling the design so that the tools and the users can handle the complexity? and What might be the best way to deploy such formal verification tools in a design process?

The ATM switch fabric is an experimental high-speed design (see Figure 1). The fabric switches 16 incoming cell streams to 16 outgoing streams in a single-cast mode (one input port to one output port) or a multicast mode (single input port to multiple output ports) on a per-connection basis.


Figure 1. The tinted arrow path illustrates a cell entering the switch fabric at input port 0 at time t 0 and exiting at output port 14 at t 1 > t 0 . The latency of the cell is L = t 1 - t 0 .

The architecture is a memory-based time switch with queues on the output ports. Switching is achieved by enqueueing pointers to cells stored in a shared memory into the appropriate output port queue or queues.

More specifically, the switch fabric consists of a shared memory block (SMB) and a shared memory manager block (SMMB), synchronized to a master clock. The SMMB contains 32 pointer queues and a free memory pool (free address queue), for a total of 4,000 cell pointers. There are two queues per output port, one fast queue and one slow. The fast queue has a higher dequeue priority than the slow queue, and each has programmable discard thresholds that are used to determine if the incoming cell can be enqueued or should be discarded.

Both the SMB and SMMB are synchronized to a master frame pulse. A cell is received and transmitted in a frame consisting of 32 clock ticks (so-called cell time). The receive (enqueue) and transmit (dequeue) operations are overlapped (parallel and pipelined) to achieve high performance.

The SMB receives 16 cells per cell time, one from each input port. After resynchronization, the interval between the cell arrivals at two consecutive input ports is two master clock ticks. When the SMB receives a cell, the cell header is sent to the SMMB, which assigns the cell an SMB memory address based on the cell header information and the status of the destination output queues. The SMMB stores the memory address in the corresponding output queue, and the SMB stores the cell at the specified address in the shared memory. This procedure is referred to as the enqueue process. In the dequeue process, the pointer of either the fast or the slow queue for each port is dequeued and provides the read address for the shared memory in the SMB. Up to 16 cells are assigned to 16 output ports (one cell per port) in a cell time. If both the fast and slow queues of a port are empty, the SMMB signals the SMB to emit an empty cell on that port.

Proving the latency property In the course of investigating alternative architectures that could support new features, it became important to demonstrate that the cell latency under some traffic constraints is constant on an input-to-output port path. This property can be informally stated as follows:

Assuming that the high delay priority traffic on one input port (i) is destined to a single output port (o), and that no other input port has high delay priority cells for that output port, prove that the cell latency from i to o is less than 100, regardless of the form of the traffic satisfying the stated constraints.

To prove this property, we must analyze all pairs of input and output ports (0 -> 0, 0 -> 1, . . . , 15 -> 15), for a total of 256 possible combinations. To explore the verification techniques, we chose two I/O pairs with distinctly different estimated latencies: 3->7 with a latency of 60 clock ticks and 4->8 with a latency of 28 clock ticks.

The Lambda system is an integrated suite of software tools implemented in SML (Standard Meta Language). It can be used to specify, synthesize, and prove the correctness of hardware systems. The specification language, L2, is a subset of SML combined with predicate logic. The Lambda system comprises a theorem prover based on higher-order logic, an animator, and a synthesis tool called Dialog, all nicely integrated through a textual and a menu-driven user interface that includes flexible rule selection and application. With the animator, you can simulate an L2 specification to eliminate trivial errors, and with Dialog you can interactively synthesize hardware that meets its L2 specification. The theorem prover automatically validates each step of the synthesis, and it can also be used as an interactive proof assistant. It is the latter function we used for our task.

Switch specification The objective was to produce an abstract specification in L2 of the switch fabric based on some 500 pages of textual descriptions and a behavioral VHDL model comprising some 6,000 lines of code. The L2 specification, although abstract, should exactly reproduce the synchronous behavior of the switch at its interface and at major internal signals. Once we constructed the L2 specifications for the SMB and SMMB, we formed the complete specification by instantiating and interconnecting the two components.

We made several simplifications for the formal model:

  • The cell header includes only the destination, priority, and cell type.

  • The size of the shared memory space is considered as infinite and hence the same for the sizes of the free address queue and the output port queues. Therefore we removed queue thresholds, flow control, cell dropping, and address return to the free address from the model. Doing that simplifies the L2 specification and makes the proof easier, yet the simplified specification preserves the normal timing characteristics of the system.

  • The cell type is restricted to normal ATM and empty.

  • Cells are modeled as record data structures, and their arrival at input ports and their departure from output ports are aligned with the time of the first word of the cell in the actual implementation. Cells are assumed to arrive at consecutive ports in intervals of two clock ticks.

  • No physical ports are modeled.

  • The queues are treated as abstract data objects and are manipulated by access functions.

Formal proof in Lambda Since the objective was to examine how difficult it is to use this verification method rather than to prove the property itself, we simplified the latency property for port pair 3-7 to reduce the amount of cases to analyze by restricting all other incoming traffic to empty cells. The property thus states that:

Assuming that on input port 3 the high-delay-priority traffic (DPRI = true) is destined to output port 7 and that all other traffic consists of empty cells, an ATM cell entering the switch fabric on port 3 will exit on port 7 with a delay of at most 100 master clock ticks.

The restriction on the input cell flow is expressed in L2 as shown in Listing 1.

Since cells arrive at the 16 input ports staggered by 2 clock ticks and each port can receive a new cell every 32 ticks (2 x 16), we use "two*(sixteen*t)" to designate the arrival time of a cell at port 0, "two*(sixteen*t)+2" for the arrival time of a cell at port 1, "two*(sixteen*t)+4" for the arrival time of a cell at port 2, and so on. The symbolic constants "two" and "sixteen" are used for block rewriting (2 x [16 x t]+2) using the successor function in the form of 2'(t + t+. . . + t), which is more difficult to reason about.

val Latency_Prop3to7#(InCell,

OutCell)=

forall t : time. exists t1:
time.

OutCell(two*(sixteen*t) + 6 + t1) ==
InCell(two*(sixteen*t)+6)

/\ (t1<=100) == true;

The complete theorem to prove is:

switch_fabric#(InCell, OutCell,
ComMem) /\
RESTRICT_CONDS#(InCell)

|- Latency_Prop3to7#(InCell,
OutCell).

Working with CheckOff-M As in the case of Lambda, we had to reduce the size of the switch fabric model for the CheckOff-M symbolic model checker without losing the fabric's main behavioral characteristics related to the synchronous timing and queue management. This time, however, the reasons for the simplifications weren't dictated so much by our ability to manage the model as by the limitations on the size of the finite state machines that the model checker can handle within the available memory and reasonable user time.

CheckOff-M performs symbolic model checking of sequential circuits. It accepts digital circuit designs as EDIF netlists or as VHDL or Verilog models (structural, behavioral, or a combination of the two, without real-time delay statements). When the tool detects an error, it generates VHDL or Verilog testbenches for counterexamples so that the error can be identified and confirmed by simulation.

The VHDL accepted by CheckOff-M is restricted, but the subset includes the typical synthesizable synchronous RTL style (for Synopsys 's synthesis tools, for example) in which we developed our model. The checker first transforms the VHDL model into the so-called Micro FSM described using reduced ordered binary decision diagrams (ROBDDs). The model is then optimized in both the vhdl2fsm compiler and the checker itself, once the synchronous mode of operation is indicated.

Properties are specified in CIL (CheckOff Interval Language), a propositional temporal logic. CIL allows an explicit quantitative definition of discrete instants and time intervals. Although CIL is slightly less expressive than CTL (Computational Tree Logic), the CIL formulas are easier to understand. The use of intervals also simplifies the way properties can be written.


Figure 2. The complex model (A) closely follows the actual design of the queues. The simple model (B) reduces each queue to a shift register.

A property specification (called a theorem in CheckOff-M) consists of two parts: an assumption and a commitment. The assumption describes the space of valid input/output sequences (including fairness conditions), and the commitment defines the actual property that the FSM must satisfy given that the assumption holds.

Since the SMB contains a large memory that can't be easily simplified, and the temporal order is determined solely by the queue manager in the SMMB, we only modeled the latter. The inputs are the cell header (HD), and the output is the address information supplied to the SMB. As in L2, frame synchronization is derived relative to time 0 by counting the clock pulses using modulo 32. The simplified VHDL model has the following characteristics:

  • Only single casting is supported (each cell has a unique destination port).

  • Functions related to accounting, parity checking, address mapping, and resetting are abstracted, while the equivalent synchronous delays are maintained.

  • Each queue has a parametrized size for easy scaling of the model. Cell-dropping mechanisms were included to handle situations when a queue is set to overflow.

  • As in L2, the only accepted cell type is ATM or empty.

  • The reset sequence is compressed to one clock cycle.

Queue models Initially we used a model that was close to the actual implementation, consisting of a memory array, two pointers, head and tail, and one variable counting the actual number of elements in the queue (see Figure 2A). We replaced this model with one based on a shift register that has no equivalent states (see Figure 2B). (This model was suggested to us by Ratan Nalumasu of the University of Utah.)

We proved the behavioral equivalence of the two queue models using CheckOff-M and then used the simpler one. Even with this simpler model, however, only a few of the 32 port queues could be included in the SMMB model, again because of state explosion. To simplify the model further, we allowed some queues to have null size; that is, they're permanently full and empty at the same time, and the storage arrays aren't constructed. This means that all cells destined to an output containing a null queue are dropped and no cell is dequeued either.

Finally, the largest model on which we could verify (parts of) the latency property without running out of physical memory (and thus time, because of excessive disk I/O due to paging) consisted of 2 output port queues of size two elements and a free address queue of size four elements, all of them 4 bits wide. All the other 30 port queues had to be null. The exact size of the model depended on the property being verified; that is, the more complex the property was (one containing larger time intervals, involving more outputs, or both), the simpler the model had to be.

Checking the latency property The following properties were verified on a SparcStation 20 with 160 Mbytes of memory.

Property 1: Latency of HD_ADDRW (a write) for cells from port 3 to port 7. The assumption of the property states that the cell headers from port 3 enter on HD bus at cycle 24 in each frame period (see Listing 2). These cells, of type ATM (not of type empty), are destined for the fast queue on port 7. All cells from the other input ports aren't sent to port 7, but they have no restrictions on their type or priority. The assumption also states that the addresses taken from the nondeterministic free memory queue are valid (not null) and the circuit is in normal operation (no reset signal). The commitment states that the write addresses for the cells from port 3 to port 7 are emitted 21 clock ticks later (mod 32) at cycle 13, assuming that the cell headers enter on HD at time t. This commitment doesn't prove, however, that by accident the address isn't permanently stuck at some fixed valid value. A stronger property would be to compare it with the address dequeued from the free memory queue and to verify that the dequeue operation on the queue functions correctly (which we assumed here because of lack of time, the object being to determine the limits on the model and property sizes).

Property 2: Latency of HD_ADDRW for cells from port 4 to port 8. The latency of HD_ADDRW for cells from port 4 to port 8 can be obtained from Property 1 by substituting 8 for 7, 26 for 24, and 15 for 13. The delay from the header entering the SMMB on HD and the write address being emitted is the same, that is, 21 clock ticks.

Property 3: Latency of ADDRW_ADDRR for port 8. The assumption for the latency of ADDRW_ADDRR for port 8 is the same as for Property 2 (see Listing 3). The commitment says that the read address for port 8 is emitted (the cell address is dequeued) on ADDR at cycle 22, which is 7 clock ticks after emitting the write address (writing the cell into memory) at cycle 15. The signal delayedADDR (7) is used to remember the write address shown on the ADDR bus 7 clock ticks earlier.

Property 4: Latency of ADDR(W)_ADDR(R) for port 7. Property 4 is similar to Property 3 in that it tries to prove the delay between the emission of the write and read addresses for port 7. Unfortunately, this separation is now 39 clock ticks, which caused the checker to run out of memory.

Since the latency for the port pair 3-7 could not be proven directly, we tried to determine a lower bound. Properties 5 to 8 state that the write address issued for port 7 at cycle 13 won't appear as the read address within the next nine clock cycles (from cycle 14 to cycle 22). Listing 5 shows the code for property 5.

The removal of the delay line delayedADDR and its replacement by the use of the enumeration of possible address values in the property were essential for the successful verification of this property. In fact, this property had to be broken into four independent parts, each verifying that a specific read address value is not emitted before nine ticks after the emission of the same write address. Note, also, that this property couldn't be verified if the addresses weren't unique in the system, that is, if a nondeterministic source of free addresses were used.

Properties 6 to 8 are similar to Property 5 except that ADDR is checked for values 2 to 4 respectively.

From properties 2 and 3, we can conclude that the latency for cells from port 4 to port 8 is 28 clock ticks. From properties 1, 5, 6, 7, and 8, we can say that the latency for cells from port 3 to port 7 is at least 30 clock ticks. The verification of all the properties took about 240 minutes of CPU time.

Integrating formal verification with design The Lambda theorem prover seems best suited to the synchronous architectural level, with abstract data types. The main synchronous system timing can be easily and clearly specified, and the behavior can then be proven to satisfy certain principal properties. Complex control and protocol features can be checked separately using a model checker (with data abstracted away).

The deployment of model checking is very feasible and profitable at the RT level. Designers can apply the model checker to smaller pieces of the design, ensuring their correctness, then integrating them in more complex subsystems, which, depending on size, could be verified again using model checking (with some abstractions) or simulation (or both).

We found Lambda and CheckOff very easy to use. The Abstract team was also very helpful in getting novice users like us off the ground. Finally, we should note that Lambda provides high-level synthesis features that can be used to refine an L2 specification to the RT level and then output it in VHDL for use by other tools. The level of abstraction we used in our specification, however, was still above the acceptable style for synthesis. *

Ying Xu is a doctoral student at the University of Montreal.

Eduard Cerny is a professor at the University of Montreal.

Allan Silburt is senior manager of the Hardware Modeling Group of Nortel Semiconductors (Ottawa).

Roger B. Hughes is the vice president of engineering at Abstract Inc. (Fremont, Calif.).

To voice an opinion on this or any Integrated System Design article, please e-mail your message to miker@isdmag.com.


integrated system design  November 1997



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]



For more information about isdmag.com e-mail cam@isdmag.com
For advertising information e-mail amstjohn@mfi.com
Comments on our editorial are welcome
Copyright © 1997 Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About