Editor's Note: The EDA Consortium has selected Philips Semiconductors' Nexperia silicon design team as the recipient of its annual 2002 Design Achievement Award. The team received the award for the Nexperia-based Home Entertainment Engine pnx8500, which along with its successor pnx8525 are the first instantiations of the Nexperia Digital Video Platform (DVP). These single-chip systems receive, decrypt, decode, convert and display multiple video streams having different data formats.
A team of 60 engineers worked for two years, from 1999 until 2001, to develop both the underlying platform and the pnx8500. The chip has around 8 million gates, with 70 clock domains and an operating speed of 200 MHz. It has MIPS and TriMedia CPU cores along with an array of peripheral devices. It also contains 7 analog PLLs (phase locked loops), 9 DDSes (digital delay synthesizers), and 237 RAMs.
Augusto de Oliveira, Philips' chief architect of digital consumer systems, and Hans Spanjaart, director of VLSI engineering for digital consumer systems, discuss the challenges they encountered and the design flow they used in the following interview.
EEdesign: This chip is the first instantiation of the Nexperia DVP platform. Which portions of the chip were already pre-determined by the platform, and what new elements were added?
Oliveira: The platform specifies the overall architectural framework, including the on-chip bus architecture and the IP [intellectual property] design guidelines. Some of the IP blocks were pre-existing, like the CPU cores and some of the peripherals. Others were designed new, such as video-related blocks like scalars.
One of the unique elements of the Philips platform approach is that we have a set of guidelines specific to each domain, including the CPU selection. For digital video, we have selected MIPS for control portions, and TriMedia as the DSP. But one can select what kind of MIPS to use, depending on performance requirements. For this chip, we selected a 32-bit MIPS running at 150 MHz. For TriMedia, we used a 32-bit core running at 200 MHz.
EEdesign: What kind of bus architecture did the platform provide?
Oliveira: One of the main aspects of digital video is to move large chunks of data in and out of the chip, so we had to give a lot of thought to how this is done. Each IP block has an interface that we call "level two." That decouples the IP block from the specifics of the bus that it is utilizing. So we predetermined the basics of the architecture prior to the chip design, but since this was the first instance of the architecture, we did some fine-tuning.
We used the PI bus, which was created in Europe by a consortium of companies. It's primarily used for low-bandwidth DMA and for CPUs to control the different peripherals on the chip. It's a very high speed bus used by all the heavy-duty video blocks on the chip.
Figure 1 - pnx8500 architecture diagram
EEdesign: What IP blocks were created new for the pnx8500?
Oliveira: A memory-based scalar block. That takes different formats and sizes of video, and scales to the output resolution you desire. Another block we did for this chip was an image composition processor. It takes different graphic frames and video sources, and composes it for the final output stage.
We have a very rich assortment of peripheral blocks in this chip, UARTs and USB blocks, for example. Since we were also establishing IP guidelines at this time for Nexperia, some had to be reworked as part of this project.
EEdesign: How much of the design was reused, and how much was new?
Spanjaart: The whole design is about 8 million gate equivalents, about 30 million transistors or so. I think about half of them were created within the framework of this project, and the rest were supplied as building blocks.
EEdesign: How much time did it take, and how much time was saved with the platform-based approach?
Spanjaart: The whole process took about two years, and there were close to 60 people involved.
Oliveira: This is for the VLSI design team. The total Nexperia team, including hardware, software and systems people, was close to 300 people for the DVP platform and for this chip.
Since this was the first [Nexperia DVP] product, we were doing the up-front investment. What we are doing now is reaping the benefits of platform-based design, now that the IP blocks and architecture have been created. I would say, though, that platform-based design did already help for this chip, since establishing the architectural guidelines for IP creation and connection simplified things quite a bit.
So the 60 people were both for the platform and the chip. There's quite a bit of refinement when you do the design itself. No architecture is complete until we see a product in the end.
EEdesign: What do you think the time savings will be for future Nexperia DVP chips, compared to starting over from scratch?
Oliveira: I think we'll see a reduction of at least 50 percent, not only in time but also in the number of people involved, for both hardware and software.
We do have an escalating software problem. More and more functionality is implemented in software, and design reuse is helping, but we still have long lead times there.
EEdesign: Since many elements of the pnx8500 were pre-determined by the platform, was there any inflexibility, or were there performance tradeoffs?
Spanjaart: It is always an item of great attention to reach a sufficient level of performance. The restriction was not in the processor itself, but there were a number of iterations on the bus architecture.
Oliveira: One thing we did is to take into consideration that memory technology will keep evolving. That's why we created the level two IP connection. It allows us to evolve, for example, the connection network on the chip, without having to redesign the entire chip, and therefore keep up with performance requirements.
EEdesign: What were your main challenges from a design perspective?
Spanjaart: The sheer size of the design, which was about 32 million transistors. It was a challenge to run on the existing tools at that point in time. We also had a complicated design, in terms of functionality, with two different processors and over 20 different IP blocks. There were also lower-level challenges in that there were many, many clock domains.
There are also analog modules in there, which requires different tool chains, and people with different skills. Also, the processors have issues related to software development, so software people, and tools, have to get involved to get the final product up and running.
EEdesign: Where did the design effort start?
Oliveira: This design started at the system level. There were intensive discussions with our customers with respect to the performance and functionality required. We created a number of models using C based simulation, spreadsheets, and napkins to keep all the design parameters in check. We used in-house, C based simulation.
Figure 2 - Design flow diagram
Spanjaart: From that point on, the specification was derived from the system specification. We followed Verilog or VHDL design approaches, depending on the different IP blocks delivered. Designs done here were all done in Verilog. We conquered the issue of size by building a large compute farm here. Otherwise, we used classical tools and an automated flow for doing verification.
EEdesign: What commercial EDA tools did you use?
Oliveira: For RTL coding, we used Cadence Verilog and NC-Verilog tools. The next step was emulation, and we used Quickturn and Ikos boxes. That was to create a link back to our software colleagues, so we could give them a model to work with before silicon could arrive.
We have an in-house, graphical tool that works in the synthesis domain. It keeps track of scripts and assembled output from all the different synthesis runs, produces output, and gives rapid feedback to design engineers. Otherwise we used Synopsys Design Compiler, and for static timing, [Synopsys] PrimeTime. Later in the design flow, we used Silicon Perspective [First Encounter] for timing-driven placement. It was only available late in the game.
For formal verification, we used the Verplex equivalency checker. For place and route, we used Avanti Apollo. The parasitic extraction, LVS, and DRC was all based on Avanti's Star-XT and Hercules tools.
EEdesign: Aside from the graphical synthesis tool, were there other in-house tools?
Oliveira: Certainly in the floorplanning domain we had to develop our own tools. We're currently looking into third-party tools, but at the time, they weren't available or weren't sufficiently well suited for the requirements here.
The way we address large designs is by splitting the design up. A number of building blocks are glued into a "chiplet," and there were a number of these chiplets, each the size of a decent chip by themselves. Our in-house tool supported the assembly of building blocks into chiplets.
Philips also has a very high priority for testability, so we have a suite of internally developed tools for scan insertion, built-in self test, vector generation, and links to testers on the test floor. With that we are able to achieve very high fault coverage. Our internal tools let us address testability at the block level, and then do integration at a higher level.
EEdesign: What in the pnx8500 design flow needs improvement?
Oliveira: We are looking more at tooling for system design. We have worked with Cadence on VCC. We have our own modeling language, but that's something that needs to be improved as we move forward.
In verification, we're looking into tools that can automatically generate I/O patterns to make sure we cover a lot of interactions with IP blocks. We are looking at tools that could create micro drivers for IP blocks automatically.
Spanjaart: One issue we ran into was timing closure at the top level, where we put everything together. That resulted in a changed architecture so we can minimize the cycles we have to do timing closure.
Oliveira: We did some adaptation to our bus topology so it meets top-level timing closure. We discontinued the use of the PI bus and are using a combination of a synchronous and asynchronous bus to create what some of our people describe as "islands of synchronicity." It causes our bus to run faster and is also going to dramatically improve our ability to do timing closure.
EEdesign: Who manufactured the chip, in what process technology?
Spanjaart: This particular design was done by TSMC, in 0.18 micron technology, which was then state of the art. There was some process variability, but we didn't hit any major process snags.
EEdesign: What did you learn from the pnx8500 design effort?
Oliveira: Certainly we learned quite a bit from this design. We did some tweaks on the chip architecture to help with physical implementation, and also to improve or add performance, memory bandwidth, and clock cycles. But thanks to our thinking up front when we started this design, the majority of the IP blocks were preserved during these transitions.
One thing I'd like to see is better interoperability with all the different tools from different vendors. For state of the art designs, we can't expect state of the art tools to all come from the same vendor.