United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 
     

design tools

Hardware-Software Codesign of an Image Processing Unit

A codesign tool suite helped handle the demands of hardware-software codesign by facilitating architectural exploration, partitioning, scheduling, and interface synthesis.

by Clarisse Adida, Michel Boubal, Xavier Granger, Philippe Lamaty, and Jean-Pierre Moreau



In the busy electronic equipment market, as in numerous others, increasing pressure on development time and costs heightens the importance of using the most appropriate design tools. At the Aerospatiale Missile Division, we used an early version of the hardware-software codesign and architectural exploration tools from Arexsys to develop an image processing system.

Bringing a continuous flow from abstract specification to the traditional EDA tool domain, we expected Arexsys's tools to resolve some of the problems raised by developing complex applications. We developed a customized methodology to meet our goals (see Figure 1), which highlighted the benefits we expected to accrue--and did accrue--through the use of the tools. First and foremost, we developed a customized hardware-interface-software arrangement that precisely fit the needs of our project. Second, we thoroughly and quickly explored the various architectures available. Finally, the tools handled the otherwise tedious work of partitioning the design into its component hardware and software parts.

At the beginning of the image-processing project we identified several requirements. The specification language had to be expressive enough to describe both the continuous and the discrete parts of the system. The tools needed to facilitate architectural exploration and intermodule communication synthesis. Architecture implementation required both C (for software) and VHDL (for hardware). We had to be able to cosimulate hardware and software subsections. Finally, the test bench had to remain unchanged during the entire design cycle, to avoid additional work and delay during development.

Figure 1 A customized codesign methodology

Our customized methodology began with the articulation of the system model in C and SDL; then moved through the communication synthesis, architectural exploration, and partitioning stages; and ended in the cosimulation of the full system.

Arexsys's tools use SDL as a main entry language, so we used SDL for top-level specification and added C functions to compensate for SDL limitations. The tools have the expected features to support the other requirements. C-VHDL cosimulation--a way to functionally verify a hardware-software interface--was essential for us since the hardware-software interface lay at the origin of integration bugs, which could have led to considerable cost overruns.

Software interface synthesis
We used Arexsys's tools to automatically generate RT-level HDL code that could be handled by commercial logic synthesis tools and RT-level C ("scheduled C") that could be handled by a processor-specific C compiler. To make the environment work on our processor, we had to provide a very simple set of memory drivers covering basic memory writing and reading operations, a VHDL macro describing the processor external interface (the bus model), and a cycle-accurate C model for a more precise cosimulation.

Figure 2 Hardware-software cogeneration

The tools provided the hardware-software cogeneration engine that created low-level C code, I/O hardware, and RT-level HDL code.

We used the tools to automatically convert the functional code generated by the architecture generation tool and the functional C code present in the initial system specification into a low-level, completely scheduled C code without needing any RTOS. The C code included the specific drivers for the hardware interface. Employing the behavioral VHDL code generated by the architectural exploration tool and the VHDL macro describing the embedded processor's interface, the tools generated clean and readable RT-level code, ready for logic synthesis (see Figure 2). The ability to quickly view detailed implementations reduced the impact of in-depth partitioning exploration on development time.

High-level description in SDL
At the system level, the current practice is to express specifications using such high-level programming languages as C or C++ (from the software world), the ITU standard SDL (from the system world), or a mix of both. For example, SDL allows embedded C functions; plus, the languages come with good development environments for software. Although none of the languages meets the entire set of requirements of the system designers, SDL is one of the best candidates (see "The Pros and Cons of SDL").

We used SDL and C to describe both the application and the test bench, allowing behavioral simulation at the specification level. In the initial steps, we had a functional view of what the partitioning would be. We used the functional view to validate the overall specification, formalize the requirements of the application, and elaborate and tune the test bench.

The model was a pixel data flow that modeled all algorithmic parts in SDL. We used several SDL concepts, including Abstract Data Type (ADT) and procedures with parameters, labels, and arrays. (Arrays are mainly used to model delay lines.) Performing the simulation using the Objectgeode SDL development environment from CS-Verilog SA, we first reduced the initial image (256 x 256 pixels)--the background landscape--to 128 x 128 pixels. Even so, since the processing of one 128 x 128 image corresponds to the simulation of a formidable 70,000 states, we performed the simulation under Objectgeode on a reduced image sequence and simulated the complete sequence by executing C code that the environment generated.

We used Arexsys's tools to explore several architecture and communication solutions. The interactive process allowed us to choose different functional clustering and communication protocols. The tool then automatically translated HW and SW clusters to VHDL and C respectively. We performed cosimulation between these modules using the functions generated for data exchange between C and VHDL.

Communication synthesis
For each function in our design, we had to decide whether to implement it as a dedicated piece of logic--the hardware or ASIC route--or as a piece of code executed by an embedded processor--the software route. It's a multiple-choice exercise: one processor or many, identical or different, DSP or microcontroller (or some mix of both), real-time OS or "nanokernel" with full software synthesis. The greater the number of processors, the more complex the communication between them and with the dedicated logic parts. Designing efficient means of communication between system-on-a-chip subparts, such as buses and shared memories, is critical to the success of a design.

Figure 3 Architectural exploration

Architectural exploration is typically a multiple-choice exercise involving processor, bus and communication modules, and silicon implementation--with each choice causing a ripple effect. The automated codesign tools can predict such effects.

The SDL communication model is very abstract, employing message passing and infinite queues. The Objectgeode simulator considers it an intrinsic property of the model, so refining a specification defined in SDL required a communication synthesis mechanism. Objectgeode employs several communication protocols: rendezvous (RDV), remote procedure call (RPC), FIFO queue, and shared variables. Except for the FIFO queue, however, the current SDL standard doesn't implement such mechanisms, which are specific extensions of the Objectgeode simulator. Similarly, Arexsys provides a library of basic communications protocols, such as FIFO and RDV. To improve optimization, we decided to develop a set of specific communication mechanisms to extend the basic library.

Scheduling
The scheduling operation is, of course, closely related to the implementation level. There are two different approaches: one used for the modules to be implemented in hardware and one used for the modules to be implemented in software. Where hardware is concerned, the VHDL generated by the scheduling is synthesizable by an RT-level synthesizer. With software, the procedure calls are removed, thus avoiding task swapping and improving execution time and software security.

Each SDL array access is considered a memory access. If more than one element must be read or modified within a state, the state is split into two--a procedure unnecessary in C. In VHDL, arrays can represent a memory. Indeed, if the array is a two- or three-element structure, the best implementation isn't a memory, but instead a set of registers. Since all VHDL synthesis tools understand this protocol, it isn't necessary to make an explicit distinction between the two cases. It would, however, make sense to offer the user a choice on a per-variable basis.

Concerning the scheduler, the original tool has evolved into two tools: one for software, another for hardware. The initial scheduler generated a lot of additional states and variables unnecessary for software. In fact, it had been developed to obtain essentially synthesizable VHDL and was therefore fully tuned to hardware constraints. The new software scheduler, though, avoids manual C-code optimization and it no longer generates variables linked to hardware optimization. In the same way, array access is no longer distributed into additional states, as it's necessary in VHDL for memory access.

When the architecture is fully defined, the next step is to produce detailed descriptions for both dedicated hardware and software that are suitable for design entry tools, such as logic synthesizers and processor-specific C compilers. In the software world, the level corresponding to clock-accurate hardware descriptions is a level of program where all the scheduling is done--for example, all the elementary tasks are time-ordered and allocated to a particular processor when several are present. To reach this point, software developers traditionally write each task in C and rely on a multitasking real-time OS to dynamically order the different tasks at execution time. However, allocating tasks to processors when several are present must be decided beforehand.

Techniques similar to those used for behavioral synthesis requires tools that automatically generate fully scheduled code. For example, in the Arexsys hardware-software codesign tools, scheduling is static, and hence fully predictable; the RTOS reduces to a much simpler nanokernel for I/O and memory access. Though it's too early to compare the two approaches and to estimate which will offer the best solution for a given application, the synthesis approach should bring a significant advantage in minimum overhead for applications using very specific processors (like ASIPs) or several cooperating processors.

Partitioning and flattening
The aim of architectural exploration is to transform an abstract specification model with real communication protocols into a physical model that consists of a set of hardware and software processors (see Figure 3). The design space exploration and architectural-generation tools use SDL code and the system configuration information to provide an interactive graphical environment to the designer for the following: functional partitioning and merging (to allocate functions to a specific hardware or software functional unit), communication synthesis (mapping abstract channels to real protocols), and hardware/software partitioning (virtual processor allocation). In the background of the exploration and partitioning process, the tools keep a textual representation of the C and VHDL code, with a cross reference to the initial SDL code. With just a mouse click, we could highlight the line of SDL that corresponded to a particular line of C or VHDL. Communication protocols were taken from an extendable library. We could run cosimulation at any time to validate the behavior and evaluate the model's performance.

The flatten operation enabled us to remove two hierarchical levels. The structural operation removes the SDL hierarchy, flattening, for example, a structural entity (a block) composed of other structural entities (a process). After this manipulation, the top structural entity is removed and replaced by its contents. The functional level concerns hierarchical state machines as introduced in the new version of Objectgeode.

The cluster operation added hierarchical levels to structural units by clustering several structural entities to reorganize the design as we wished. We performed the clustering operation to reorganize the hardware by grouping different units in a top component, and to preface structural merging of the software.

The structural merge function merged several structural entities, eliminating initial entities but leaving intact their state tables, which it combined into the upper structural unit. If the model wasn't hierarchical, we first had to create a cluster to function as the "upper structural unit." The structural split operation, the opposite of the merge function, associated each state table of an entity with a new entity.

Then came the partitioning, which is a simple operation that finalizes use of architectural primitives. At this stage, the designer assigns entities to be mapped into hardware or software. Then the tool automatically generates corresponding code that includes a simple hardware or software label in the code.

The C generation
To manage several state machines working concurrently, the traditional solution uses a real-time kernel to schedule the execution. However, task swapping wastes time, so Arexsys's tools employ an alternate solution: the coroutine, which performs software scheduling to avoid communication deadlocks. Each state machine is executed during one atomic state (without a procedure call) before another state machine is executed. The coroutine approach comes close to synchronous execution, which makes the software safer. At this stage, the generated C code doesn't target a specific microcore, allowing designers to choose the C implementation platform later. The communication protocol, however, is essential to the execution time of the generated C code. To optimize this communication time, we developed a specific software protocol based on RDV.

The Pros and Cons of SDL
SDL is a well-proven standard used in most large system houses, particularly for telecom. It provides formal semantics that uniquely facilitates verification and test of system-level specifications--a very important feature for hardware system design. It also addresses some critical characteristics of hardware systems: Almost all hardware systems include concurrency; SDL allows the user to specify parallel tasks. Hardware systems are often distributed--that is, they consist of a set of basic computation units that may process data at different speeds and use different clock rates. SDL's asynchronous communication feature eases the description of such behavior. It also allows for the high-level specification of timing, though it leaves detailed behavior to the HDLs.

The current SDL version suffers several limitations, however: It includes only a few constructs for computation (such as nested loops for array manipulation and specific arithmetic operators like shift). Designers can use C to extend SDL, but that option restricts the capabilities of test and formal verification tools--as does the use of shared variables.

In some cases, the use of a single communication model (asynchronous queues) may lead to inefficient specification. The SDL queue model is very sophisticated, and hence very expensive when implemented in hardware using naive approaches. A hardware generation tool must thus contain a sophisticated communication synthesis step to optimize the mapping of the SDL queues. The latest SDL versions handle this problem by introducing the Remote Procedure Call (RPC). A smart use of RPC may work around the limitations of SDL queues.

Despite these restrictions, SDL remains the best existing specification language for hardware modeling at the system level, because in practice all other languages suffer more restrictions.

We first developed an RDV that took seven simple states to write an integer and three simple states to read the data execution time x, with the hardware-dedicated initial scheduler. The optimized RDV took just two simple states to write an integer and two states to read the data execution time x/2, with the software-dedicated initial scheduler. We didn't optimize the protocol itself, but rather its internal representation in the Arexsys intermediate format. The new software scheduler cut the execution time in half. The resulting C code is very stable and deterministic, thanks to the absence of interrupts. If interrupts were allowed, the code wouldn't remain fully deterministic.

Generating VHDL
We generated a complete VHDL version of the tracker and simulated it with Cadence Design Systems, Inc.'s Leapfrog. We mapped the first version of the model with a FIFO communication protocol. In the generated code, each state machine performs state transitions on the rising edge of the clock, merging the communication into the global execution thread. Furthermore, we converted the implicit parallelism of SDL into a real process parallelism as allowed by the VHDL language.

We can't overestimate the importance of communication architecture. To limit traffic, we merged some blocks at the SDL level, an operation we had to use sparingly because it breaks the task parallelism and can increase time response as well as data exchange. So we adopted a mixed solution, using different protocols according to the different process features. A future solution could use shared variables with specific memory significance.

We compiled the generated code in the RTL Design Compiler from Synopsys, Inc. The design passed the test with the single modification of data types. In the future, IEEE-standardized data types will replace user-defined data types ( resolved_integer , for instance). VHDL generation with the tools was very useful for generating implementation models from extended state machines like SDL. In particular, we could identify time response problems and bottlenecks, then solve them by way of a fast redesign loop.

The codesign tools aim to fill the gap between high-level specifications--expressed using resources coming from the software engineering world (Case tools)--and traditional EDA tools taking their input at the RT level. The strategy relies on a standardized specification language (SDL); and an interactive design space exploration based on a designer's decision about the type of solution he wants to evaluate and on an automatic generation of the corresponding code for hardware, software, and interfaces in between. It also relies on a user-expandable communication protocol library, a flexible mapping technique able to accommodate all type of microcores, and the delivery of results in HDL and C code that are directly usable by commercial logic synthesis tools and processor-specific C compilers.


Clarisse Adida is an electronics research engineer at Aerospatiale Missiles in Bourges, France. She's in charge of system modeling and VHDL code generation efficiency and is currently working on embedded software and system design methodologies.

Michel Boubal, also at Aerospatiale, is a senior research engineer in the electronics and software departments.

Xavier Granger is a fundamental computer science engineer at Aerospatiale. He has five years of experience in integration systems and ten years of experience in embedded software (design, test, and integration).

Philippe Lamaty, a Ph.D. candidate at the University of Cergy-Pontoise in Paris, works on image processing at Aerospatiale.

Jean-Pierre Moreau is a technical advisor for Arexsys in Meylan, France. He previously worked at the Thomson Group and STmicroelectronics in design, design management, microprocessor-based system architecture, and CAD development management.

To voice an opinion on this or any Integrated System Design article, please email your message to jeff@isdmag.com.

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About