cover story
Tell Me Again--What Does the "S" in SOC Stand For?
Because sequential steps in designing SOCs often occur simultaneously or with many iterations, codesign can rapidly turn into chaos without a clear flow and the software to keep it all under control.
by Takashi Hasegawa and John McNally
| |
|
In many organizations, approaches to the design of systems on a chip (SOCs) tend to be heavy
on the "C" and light on the "S". In other words, the major semiconductor houses are consumed with the difficulties involved in implementing 5 to 10 million gates in deep-submicron geometries, integrating silicon IP typically at the netlist level, routing it all, and achieving timing closure. Then there's the burgeoning problem of verification at the RT and gate levels and the necessity for on-chip test. All of this is very demanding and important, but it's not enough.
Focus only on the
silicon and you miss the fact that, in the product segments already making SOC a reality, the chip is often little more than a platform on which to run the software, where the bulk of the system complexity resides. That's why when the World Wide System LSI Technology (WWSLT) group of Fujitsu Microelectronics, Inc. (San Jose) developed IPsymphony, the next generation of our chip design environment, we recognized a need to go further than conventional design methodology. We couldn't just improve the current
RTL-down flows or offer an exceptionally wide range of IP.
We thus decided to implement a system-level design flow that employed hardware-software codesign, ensuring the most effective choice of system architecture and the use of IP. In conjunction with Coware, Inc. in Santa Clara, Calif., we created a new design flow through a pilot project--developing processor support packages (PSPs) for the Sparclite and other peripherals--to simulate an actual design.
Co-obfuscation
When a technology emerges, partial, fringe, or superseded technologies often try to pass themselves off as part of the new wave, masking the true impact of the new technology. A prime example is hardware-software codesign--the concurrent specification, partitioning, implementation, and verification of a system that includes embedded software and application-specific hardware.
The exact wording of definitions varies, but most codesign gurus in industry and academia would
agree that all four design components play a role. Yet the terms 'hardware-software codesign' and 'hardware-software coverification' seem at times to be interchangeable. Clearly, coverification is the final quarter of the overall process. It doesn't matter how early or how fast the coverification is; verification isn't design and shouldn't be confused with design. The four phases are never in strict sequence as in a waterfall model; iterations occur between the phases and the various levels of coverification
throughout the process.
The codesign flow we built uses tools, methodologies, and IP covering all four of the codesign phases. We could integrate different processors, real-time operating systems, and embedded software tools into the flow as required. The initial project integrated the Fujitsu Sparclite processor, Wind River Vxworks RTOS, and the Tornado Integrated Development Environment (IDE) into the Coware N2C Design System (see Figure 1). Our system-level design flow included the
target specification and defined the desired result. This specification provided a working model, emulating the system and analyzing the possible solutions including test bench, hardware, and software implementations. We broke the system-level design task into three key elements: function or algorithm design, performance design (performance budgeting), and architecture design.
Functional and performance design
Creating a functional model in C/C++ at the codesign level produced an
executable specification. We specified the real system test bench as well as the portion of the system to be implemented. The tool then split the system into hierarchical blocks known as "blockdefs" (see Figure 2). The coding methodology, which involved the separation of behavior and communication, provided two principal benefits. Separating behavior and communication allowed easy remapping of functions using interface synthesis; separating behavior and communication from the same hierarchical system
structure and tool encouraged a continuum of performance and functional modeling.
|
Figure 1
| System-level design flow
|
|---|

In our system-level design flow, the target specification entered the design core of the flow, triggering the executable specification that emerged into the verification environment.
|
The creation of an executable specification in C/C++ assisted the performance-design phase by employing a performance model rather than a functional model. The model analyzed processor performance and communication throughput needs before we chose the processing platform components.
Unlike the performance-modeling point tools, which force the user into creating a system performance model that offers no refinement path to either the functional model or to implementation, the
design system allowed continuous refinement from the same system block diagram using the desired mix of modeling styles. We explored algorithmic behavior first by designing a full-function executable specification, then refining communication to create the fully executable and implementable specification. Alternatively, we could have first used a performance-modeling style to produce a specification executable in the architecture, then refined the functional behavior to create the implementable and executable
specification. More often than not, we had to include behavior in the system performance model purely to analyze performance with even top-level accuracy. In MPEG systems, for example, the computing effort required is a function of image content and frame-to-frame change. To model performance, we had to model the algorithm--a task that many performance-modeling point tools can't accomplish.
The environment facilitated analysis of the executable specification at any stage. The analysis
included a comprehensive database with full user control to set up the block, function, thread, or variable-level data stored in the course of a simulation. A set of graphical displays allowed the presentation of all kinds of system information. These views indicated immediately how the system stacked up against top-level requirements, including throughput and overall system response. A full application programming interface enabled us to specify our own post-processing in C/C++ or Tcl/Tk scripts.
A Gantt view analyzed the system response and other temporal relationships of various components of the system. Bar charts and graphs profiled the system at many levels, starting with function call graphs and throughput analysis of the executable specification at the untimed C/C++ level. The process provided us with an extremely valuable guide for drawing the first-cut hardware-software partition.
Architecture design
Once we assessed the basic processing platform needs,
we chose one of the available processor support packages (PSPs). Partitioning allowed the executable specification to be mapped to the PSP, during which time we mapped the design block by block to either hardware or software running on it. We selected not only the target implementation, but also the communication scenario--a combination of transaction type, handshaking, and protocol options that the block will use--by choosing from pull-down menus. When the tool had assigned all the blocks, it generated
the system's memory map, which we could accept or change. After finalizing the memory map, we began the interface synthesis.
|
Figure 2
| Modeling-style continuum
|
|---|

The design methodology relied on a hierarchical division of defined blocks. Functional and performance blockdefs mapped to distinctions between behavioral and communication
parameters in the design.
|
Interface synthesis included instantiating the processor, the bus, and the memory; building address decoders; synthesizing any necessary bus bridges; and connecting all hardware IP together. Automating the process saved us weeks of effort and guaranteed that the interfaces were correct. Moreover, sharing logic resources wherever possible makes for an efficient hardware implementation. On the software side, low-level device drivers
generated and integrated with the RTOS of choice--Vxworks in the initial project--were linked into an application program with all the functions selected for software implementation and compiled onto the chosen processor. When the combined tools then profiled the software, information verified that the performance budget available in the processor remained within limits. If it didn't, we adjusted the partition or changed the PSP. Then we analyzed how much time the software spent on various tasks: writing to,
reading from, and so on. Repartitioning or rescheduling in the RTOS identified throughput bottlenecks and system response problems.
We could then investigate many bus and memory architectural issues, such as inefficient bus access or memory architecture (cache size, for instance). Since bus access, not the processor, consumed most of the power, we also identified power bottlenecks, which we could resolve by changing the architecture or by remapping inefficient software to more
power-efficient hardware.
PSPs captured in N2C allowed other designers to interface to them with ease. They didn't require an intimate knowledge of the processor or bus; they needed only to know how to specify communication to a more abstract and generalized bus connection known as the virtual bus. This bus-independent API for bus-based IP readily allowed peripheral IP blocks to communicate with any processor core/bus combination.
The virtual bus capability encouraged
portability of both the peripheral IP blocks and the processors by abstracting the details of processor and bus timing away from the designer. Interface synthesis, in contrast, automatically generates the necessary interface circuitry. The IP reuse aspect of our design flow is a working implementation that already incorporates the spirit and substance of the VSIA's specifications and guidelines regarding interface design, not yet released. The benefit for our design groups and customers is the ability to plug and
play with elements of our IP library. Interface synthesis allows IP to be integrated early in the design flow. IP becomes more portable; designers can explore and quickly assess different architectures and hardware-software partitions.
Implementation and processor support
For blocks allocated as application-specific hardware, we first specified the behavior of the block--the encapsulation--at a high level of abstraction, then further refined it as necessary. One block could
have multiple encapsulations at different levels of abstraction. The idea of multiple encapsulations compares to the concept of multiple architectures in VHDL: We could choose among untimed C, bus-cycle accurate C (BCA), bus-cycle accurate shell C (Bcash), and HDL encapsulations. Since ANSI-C was augmented with notions (such as clocks and concurrency) that are essential for hardware design at the BCA and Bcash levels, the executable specification became implementable. We refined the BCA level to a
synthesizable subset known as register-transfer C (RTC) and could automatically translate to VHDL or Verilog for synthesis.
Meanwhile, the tool's analysis capability further optimized the software, identifying blocks or functions that would benefit from additional review. The processor model verified the software at every stage, with the hardware modeled in untimed BCA C or HDL. The availability of a near cycle-accurate processor model connected to a hardware model (in untimed C) created a very
early "virtual prototype" for software development and provided an attractive combination of accuracy and fast simulation early in the design cycle. The Bcash layer that served to interface between the two was a bus functional model (BFM) that appeared "inside out" compared with more conventional BFMs, connecting instruction-accurate ISSs to timing-accurate hardware in an HDL simulator.
|
Figure 3
| Integrated design environment
|
|---|

The Coware PSP contained sequential components that handled the hardware-implemented system blocks and interfaced with other segments of the integrated design environment, including the Tornado-Vxworks tools that took care of the software implementations.
|
We used the Coware processor kit option to capture processor models, associated software development tools,
RTOS, and bus architectures for use in N2C. The tool formed a complete PSP, capturing all the details regarding processor-bus communication--including transaction types, bus protocol, memory access, timing diagrams, software compiler strategies, and links (see Figure 3). The tool refined the PSP to different levels of accuracy based on details captured in the associated timing diagram. Armed only with the knowledge of how to interface to the much more abstract virtual bus, we could use the PSP and design
peripheral hardware that interfaced to it.
The PSP consisted of three components, built up sequentially. The ISS support package (ISP) captured the instruction set simulator (ISS) for a particular processor, and described how that processor interfaced to its bus. For the initial project, we chose the Sparclite processor and integrated the ISS into the ISP, first at an instruction-accurate level with a bus-functional model, then with a full cycle-accurate model. The ISP also detailed how the
processor communicated with its local bus. The bus, the bus interface, and the memory architecture are captured in the bus support package (BSP). If the design had included a separate system bus, the BSP would also have captured that information and defined the bridge between the system bus and the processor bus. The BSP was first described at Bcash, then at the full BCA level. The software tool set--including compiler, linker, IDE, RTOS--all integrated into a software support package (SSP). The launch
project described here integrated the Wind River Tornado IDE and the Vxworks RTOS into the SSP.
We created IPsymphony in response to SOC complexities and the need for high-level design phase abstractions in a global environment. SOC designs now more commonly use embedded processors; determining hardware and software trade-offs and solutions is a major requirement. N2C provided the system-level design environment that enabled us to analyze constraints and trade-offs in a controlled environment
that allows worldwide and concurrent access. IPsymphony maintained the logistics of the system, while N2C forwarded results to other tools in the design flow. Together, the two methodologies allowed us to employ an optimized codesign methodology for creating systems on a chip.
Takashi Hasegawa is the manager of system-level design methodology in the
Worldwide System LSI Technologies unit of Fujitsu Ltd. in Japan.
John McNally joined the Worldwide
Customer Applications unit at Coware, Inc. in Santa Clara, Calif. as a customer applications architect in January 1999. He has 20 years of experience in numerous positions in the design, EDA, and CAD industries, most recently serving as group director of R&D and operations in the Systems Design Division at Cadence Design Systems, Inc. in San Jose.
To voice an opinion on this or any
Integrated System Design
article, please email your message to
jeff@isdmag.com.
|