United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 
    

asic design

Tackling the System Verification of a Network Router

Two IBM labs, half a world apart, used random test generation and test coverage software to automate the verification of a multicore network router.

by Daniel Geist and Giora Biran



As in most other aspects of submicron design, the advent of the system on a chip has left its mark on verification. At the IBM labs in Haifa, Israel and Burlington, Vt., we needed to develop a new methodology to verify a complex chip containing a PowerPC processor, two buses (CPU and I/O), a memory controller, a sophisticated DMA controller, several communication controller macros (Commacs) on the I/O bus, and other small components such as an interrupt controller and bus arbiters. We considered our particular verification problem especially challenging because we lacked both prior experience and existing solutions. As one of the lead designers put it, "How is it possible to verify this thing?" Fortunately, we did have experience verifying systems built from different components, and thus could rely on that prior knowledge to develop our new approach. The end result worked well.

The system verification process lasted for about six months, starting earlier than usual because of tight project schedules. As a result, bringing up the verification environment was harder than it should have been. Fortunately, since we based our checks on expected results, we could very quickly perform basic testing and checking. In fact, the expected results checking mechanism, once in place, required negligible maintenance. Nonetheless, Monkton--a network router--offered quite a verification challenge because it wasn't a single unit, but rather a complex system of units connected by way of different bus or channel protocols (see "The Network Router"). Simply put, it required the development of a brand-new verification methodology.

Test goals
The first phase of the methodology was to decide on where to focus testing. Our previous experience in system verification consisted of verifying an IC as a system, rather than as a collection of functional units. However, the new chip demanded a change in overall strategy; other systems, for example, require much more effort at the unit level than at the system level. In addition, most other systems contain a single well-known configuration, whereas we designed this chip to support several different configurations. These multiple configurations increased verification difficulty, but another feature--the limited functionality of the internal cores--offset that difficulty by reducing unit testing to only those functions available in the IC. Therefore, we devoted a substantial portion of the testing to exercising different system configurations.

Figure 1 Verification methodology

Since we based the verification methodology on automatic random generation, we required feedback to ensure the generation quality.

To achieve system-level verification, we primarily tested end-to-end data transfers and memory access layer (MAL) interrupts. The tests tried to cause contentions in time (a simultaneous requirement of buses and MAL by two or more operations). Contentions in space were also tested but with specific scenarios, such as the shared memory locations between MAL and the CPU. We tested bad machine paths (error handling and recovery, like timeout) only partially, because we placed the responsibility for testing the bad paths (except for specific error conditions) at the unit module simulation level. In the same way, system testing checked unit registers and configurations partially and randomly.

The verification methodology
The previous ASIC verification methodology built an extensive reference model that would check for possible errors during simulation. We decided that a reference model was too expensive; building one can be almost as difficult as designing the system itself. Likewise, writing test cases by hand was unacceptable because the number of cases to cover was much too large. Our solution to the verification problem was to use a random biased test generator for systems, Sysgen. We've been using random generation technology for several years to verify a variety of IBM server systems. The similarities between Monkton and current systems verified with Sysgen allowed the verification team to convert the server systems verification methodology for ASIC use.

The Network Router
The specification of Monkton, our network router, wasn't limited to the external pin behavior. We also specified the PowerPC architecture, both internal buses, all the internal controllers (memory, DMA, interrupt), the DMA programming model, and the DMA communication protocol with the Commacs on side-band signals. The figure shows the new, unverified cores in red.

The single most important architectural feature of the chip is the DMA controller or memory access layer (MAL), which transfers data between system memory (connected to the processor local bus (PLB) via the EBIU) and the network communications macros (Commacs). Since the on-chip peripheral bus (OPB) bridge is a slave on the PLB, OPB devices must use an interrupt mechanism to gain access to system memory. In order to balance the interrupt latency with the amount of FIFO buffering required at each communications port, the MAL combines the interrupt service function with a DMA controller. The MAL maintains buffer descriptor structures in system memory for each communication transmit and receive channel. When required, the MAL requests access to the PLB to effect DMA transfer of data from a protocol handler FIFO to system memory via the OPB Bridge and EBIU.

The system on a chip contains other small blocks: an interrupt controller, several general-purpose I/O ports, and some glue logic.

The verification methodology described here evolved with the project (see Figure 1). Because the verification methodology depended on automatic random generation, we needed feedback to ensure the generation quality, to cover systemwide functionality, and to direct testing only to those functions that actual software was likely to use. The methodology included several critical capabilities. For starters, the test generator generated "expected results" per test, and it had specific SoC testing knowledge to achieve quality coverage. We also based most of the test writing on random biased generation with automatic testing, and wrote a few tests manually to minimize cost. Finally, we performed coverage analysis, recording all test execution paths and analyzing them for missing untested features.

Instead of writing specific tests, we wrote event tables describing combinations of events that needed to be covered during simulation. The test plan directed three activities: first, the encoding of testing knowledge to the test generator in terms of test variants. As part of that activity, we specified and encoded biasing on events into the test generator. Instead of writing tests we wrote test templates to be exercised randomly. The random test generator also gave us control over the frequency of events. For the second activity, we wrote behaviorals for the simulation environments. Events described in the test plan had to be supported by the simulation environment input. Third, the test plan also directed coverage analysis.

The verification was mostly automatic, requiring manual intervention only when tests failed and during the analysis of coverage results. After simulating, we analyzed the results for failures, coverage, bug rate, and other indicators of problems in the chip. The feedback from the review allowed us to modify the test plan, the coverage simulation environment, the test generator, and the test coverage generator. Since the tool generated most of the tests automatically, we needed the feedback to ensure that we were indeed generating what we intended. We used a coverage tool to analyze traces from the simulation and to report on coverage according to the models we coded in from the test plan.

The environmental view
Sysgen, the key to the verification solution, handles system verification using a generic system approach (see Figure 2). Its tests consist of system transactions: data transfers, interrupts, and configuration transactions. It also tests scenarios, which can include specific sets of transactions used to generate more complex transactions, or can also include specific sets of transactions that exercise a specific system situation.

During simulation, the tool collected the results of the test from the memory and communication macro (Commac) behaviorals and compared them with the expected state predicted in the test. It also collected coverage information and forwarded it to Comet, the coverage tool that accumulates coverage statistics on the chip simulation in order to measure the overall quality of the test cases.

Sysgen's generation driver chooses transactions to run on the reference model that's guided by a user parameter file. The user can specify the test size and very explicit requirements from each transaction generated, such as the data transfer length, the target address space, and others. If the user doesn't specify any choices, the tool attempts to fulfill the required choices randomly, controlled by weights that bias the probability of choosing some possibilities over others. The user may control biasing by turning rules on or off, or by assigning weights to them. One method we used to increase the quality of system tests was to force sequences of events that focus system testing upon areas--such as system resource contentions--where system bugs often occur.

We added scenarios to the generation base to exercise system functions that required more than one transaction or to increase the probability of hitting specific system conditions that resist coverage (even with biased generation). The tool treats these transactions as partially ordered and interdependent. Scenarios include aborting packets, reconfiguring the TDM in the middle of a test, and resetting a channel while a buffer is being transferred. For example, the channel reset scenario runs by placing a packet on the chosen channel, disabling both the transmit and receive channels of a Commac to corrupt the packet transfer, then--after a delay--reactivating the channels and reconfiguring the Commac. Then the tool sends another packet to confirm that the channel operation had been properly restored. The tool modifies the expected results to reflect the fact that part of the corrupt packet may have been lost.

The coverage analyzer
Coverage analysis played an important role in the methodology. Since most of the test generation was automated, we had to monitor test progress automatically, as well. We used Comet, an IBM internal coverage measurement tool that receives as input many samples of event combinations and returns statistics.

The test plan defined about 30 small coverage models to measure combinations--including bus interactions, DMA interactions, and concurrency of activities--on various functional areas of the chip. One model measured simultaneous arbitration requests in the MAL from two Commacs (including channels on the same Commac). In other words, each Commac was configured to a different arbitration group priority; the Commacs raised their priority level (when the service was too slow) and all combinations of Commac pairs occurred at the same time.

The fields in the table below enumerate the possible values that each column could contain. In this example, we wanted to cover the first Commac (one of four), we wanted to cover the first Commac belonging to the two arbitration groups, and so forth.

Bus ID for Commac 1 Arb group Commac 1 Arb group Commac 2 Arb req level for Commac 1 Arb req level for Commac 2
Emac, SCC, UART1 UART2 1, 2 1, 2 Low, high, urgent Low, high, urgent

To handle the entire table, the tool had to cover the Cartesian product of all the combinations. Note that we didn't specify the second Commac's type--its specific content didn't matter. The most important models in the test plan measured back-to-back transactions on the internal buses; they reassured us that we were exercising the system well with our tests. We also used coverage models to measure how well we were covering random internal register configurations in the chip.

Once the chip was mature enough, we began a review process. Every few days, we discussed the coverage results and took action when coverage was missing. Sometimes we enhanced the test generator, sometimes we added manual tests, and sometimes we checkmarked the missing coverage as OK.

Figure 2 The verification environment

The verification process was mostly automatic, requiring human intervention only when tests failed and in analyzing the coverage-but not in the test writing.

At the beginning, we were seeing many bugs because of the premature entry into the verification process. Later, as the system stabilized, the bug rate dropped and the number of simulation cycles increased. Since we were generating tests automatically, we could run an unlimited number. At first, most tests were failing; the bottleneck wasn't in test writing but in debugging and fixing bugs. Since we decided to perform most of our checks at the end of the simulation, we received very little information as to the causes of the failures. The complexity of the chip meant that debugging problems could take days. Eventually, we reached a level of expertise which reduced debugging time dramatically. We made a nontrivial resource investment here, and as time progressed the debugging time dropped sharply (to less than half an hour) for two reasons: the debugging expertise developed when it became quicker to pinpoint which unit or transaction was at fault, and the failure rate dropped.

Ever been experienced?
The generator proved to be a very effective device for creating tests--much more effective than manual testing. We also found very interesting bugs in corner cases. In one case, when the MAL processed a packet with varying buffer lengths, it sometimes lost data. We used one full-time programmer, which added testing knowledge and avoided dedicating a full-time engineer to write test cases (except for a small number of manual tests).

Comet result reviews led us to three missing features in the hardware, which we didn't fix because of time pressure. The reviews gave us confidence in the quality of our tests and confirmed that the generator was doing its job. The fact that coverage was constantly increasing on all the various modes of operations showed us that randomization was working well. The results also inspired us to add new tests to the test plan (or to add test knowledge in Sysgen).

Comet did cost us more resources than we expected--largely because it was a first-time use. A lot of infrastructure was missing, but continuing projects can now benefit from our work on Monkton.

Limitations

We also paid a price in using the expected results means for verification. In addition to the cost of debugging, we were limited to the generation of tests where we could unambiguously generate the expected results. Using expected results as a means for checking the system made generation easy when testing good machine path (GMP). However, the method made it difficult to test events whose occurrence in time was difficult to predict, such as exceptions and interrupts. In most cases we refrained from testing bad machine path (BMP), which we assumed that the unit simulation environment would test. Since we weren't verifying the entire chip's functionality, we prioritized the BMP features and invested resources in adding testing knowledge for those BMP features that interested us.

We found a few hardware problems in the lab after the chip was fabricated. In all but one case, we could circumvent the problems and proceed with software development for the chip. So although the chip wasn't completely bug-free when it was fabricated and tested, the client considered the system verification a success. The verification effort won an IBM award and the methodology is currently being adopted on other chips that use those cores.


The authors wish to thank those who assisted in the writing and research of this article: Tamara Arons, Michael Slavkin, Yvgeny Nustov, Monica Farkas, and Karen Holtz of the Haifa Research Lab in Israel; Andy Long, Dave King, and Steve Barret of the Field Design Center in Burlington, Vt.

Daniel Geist is on the staff of the IBM Haifa Research Lab working in automation of functional verification of VLSI design. Before joining IBM, he was a researcher at the NEC Research Institute.

Giora Biran is a senior engineer at the IBM Haifa Research Lab working in design and automation. He has worked in communication, controller interface, and embedded systems design.

To voice an opinion on this or any Integrated System Design article, please email your message to jeff@isdmag.com.


integrated system design  June 1999



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]



For more information about isdmag.com email webmaster@isdmag.com
Comments on our editorial are welcome.
Copyright © 2000 Integrated System Design Magazine
  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About