Existing system-on-chip design methods often create bottlenecks in block-level integration and verification, hardware/software co-development and semiconductor-process portability. Designers are also experiencing issues stemming from rigid processor cores designed for desktop architectures of the 1970s and 1980s. Core designs like these are troublesome to integrate with other system blocks and difficult to adapt to today's high-volume, high-performance embedded applications.
Application-specific configurability is now emerging as a way to deal with these co-design barriers. For the first time, the embedded designer can use a configurable and extensible micro- processor architecture and development environment. Armed with this capability, a designer has the greatest design latitude for building highly differentiated and optimized synthesis processor cores for use in ASIC-based products.
Processor architectures such as Tensilica's Xtensa give designers the crucial configurability to tailor and adapt a processor implementation to precisely match specific application requirements. Here, one can build any variety of processor around the base instruction-set architecture to optimize application performance, code size and die size, as well as power dissipation.
In this design scheme, the embedded-system engineer uses a processor generator to create one or more application-specific embedded processors. The generator draws on state-of-the-art processor, EDA and embedded-software development technologies to create all the hardware and software databases and tools for the complete processor.
Thus, the generator lets designers use the unique characteristics of their applications. For example, they can select and define such processor attributes as DSP coprocessors, cache and memory hierarchy, exception configurations, interrupt-level settings and processor interface bus width, and even create their own design-specific instructions.
These new configurable-processor technologies come at a time when hardware and software co-design plays an increasing role in embedded-system design. The embedded-system designer gradually evolves the system concept from an abstract representation, often in the form of a fast system simulation model, to a detailed implementation, generally including simulations of each hardware and software component. This sequence of more-refined models allows increasingly accurate modeling of cycle-by-cycle behavior of the implementation. It relies on creation of several interlocking models to address different forms of verification-especially validation of code, logic, timing, power and interface protocols. Moreover, the designer may need all of these models working together to permit exploration of different partition options and different implementations of the hardware and software blocks. This new flexibility in partitioning also helps him or her in the timely and correct integration of all the systems as they undergo detailed design.
A hierarchy of partitioning is central to co-design. Part of this involves identifying major subsystems within the I/O and the computations that are required. Partitioning deals with segmenting the system portions implemented in hardware as various special-purpose function units and those implemented in software running on one or more processors. Then, within each of those partitions, the embedded-system designer needs to refine each so that the software is further divided into particular threads and procedures. Hardware blocks must ultimately be broken down to the final gate and transistor-level implementation.
Current EDA tools provide a certain level of embedded design productivity. Designers perform most of the hardware design using hardware-description languages (HDLs) like Verilog or VHDL. Or they reuse previously developed blocks or blocks they've earlier acquired that are represented in those HDLs. Similarly, software engineers develop code in C and C++ and third-party software infrastructure like real-time operating systems (RTOSes).
However, the industry is undergoing significant changes that affect embedded-system design and the requirements placed on co-design. More designs combine advanced user interfaces, computation and protocol processing. There is greater system complexity, measured by the number of gates and bytes of code used in next generations of embedded systems.
These emerging co-design issues are best handled by improving the range of choices in the partitioning and implementation of the system's hardware and software design. More latitude is needed by the designer to verify that all the hardware pieces operate efficiently with one another, the software pieces work with one another, and the hardware works with the software.
Application fit
One way to address these issues is to give the embedded designer the ability to configure the system processor to exactly fit the application. By doing so, one can dramatically improve the efficiency of a design by extending the instruction-set architecture specifying memory hierarchy characteristics (instruction RAM and cache, data RAM and cache), bus interfaces and the suite of closely coupled peripherals required for a given application, like sophisticated interrupt controllers, timers and debug support.
Using a configured application-specific processor, the designer achieves significant improvements in application-level performance and cost reduction. Tensilica, for example, has demonstrated performance improvements of five to 20 times over a broad range of software algorithms for content encryption, motion estimation for video conferencing, Viterbi decoding for wireless communications and image compression for digital cameras. Configurability allows inclusion of all the features required to make a particular application run extremely well and omit the extraneous features of a generic one-size-fits-all processor. The result is a much leaner, faster processor subsystem compared with the conventional processor.
With such configurability, processor performance targeted at implementing a given set of functions can meet or exceed the performance of dedicated hardware for the same functions. Also, embedded designers can avoid being locked into a rigid configuration and paying the price for design or standard changes.
Matching the processor to the tools and hardware languages used by the other hardware blocks is key to improving co-design. However, until now, the processor has usually been a black box within the integration process and represented only by approximate models of detailed behavior at the gate and RTL levels. Accuracy limitations and unavailability of low-level RTL models for the processor thus makes the processor a foreign object within an ASIC flow.
This issue adds to the complexity of performing static timing verification, test insertion, floor planning, and back-end place and route. Conversely, a processor that is represented in exactly the same form as other blocks in a design allows an expeditious and efficient integration into hardware.
Tool kits count
Likewise, the embedded designer must have world-class software tools for efficient co-design. Modern compilers, libraries, cycle-accurate simulators and debuggers provide superb code quality and extensive support for developing, profiling and validating the code so that the software engineer can quickly design and integrate the software system. This leads to tighter and more-seamless software integration than has generally been available to embedded designers who rely on generic, processor-independent third-party tools.
Co-design success also depends on a smooth continuum of verification models, which allow one to easily build high-level system-simulation models that may be independent of a particular processor. These verification models are used as well to model high-level interactions among system-level components, including the support of C-level peripheral, multiprocessor and interface modeling.
The designer then refines this model to implement the software running on the processor or processors in an instruction-set-level simulation. Performance modeling is performed on that model using cycle accuracy to verify that system throughput is adequate. Next, a tool like Synopsys Eaglei or Mentor's Seamless CVS supports the RTL-to-processor interaction and ultimately one can go down to full RTL-level simulation of the entire system.