United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

The Art of Architectural Exploration

System-level design is requiring enhanced modeling techniques to lend needed efficiencies to the development cycle.

By Dale E. Hocevar


Ask your fellow engineers how they analyzed the architecture or system-level performance of their latest design and what do you get? Too often youýre shown notes with cryptic estimations, tables of computation requirements, and-at best-spreadsheets. Inquire about system simulation and the response is, "We don't know how to do that," or, "That takes too much time." Why? The approaches and tools for analyzing an architecture, or simulating at the system level, aren't well known or frequently used. However, several methods and EDA tools do exist. Some are directed toward specific aspects of design, while others are more general. Often simple models can be developed quickly.

In the architectural design phase, the system architect experiments with different architectures by modeling and interconnecting various functional elements, and then analyzing their performance. The goal is to develop an architecture that will meet performance requirements without exceeding cost requirements. In the digital systems context, an architecture is comprised of hardware (a collection of application-specific analog and digital hardware, processors, processor bus, memory, and special purpose processors) and hardware- specific software (a kernel or real-time operating system, and device drivers). In short, this is the embedded system platform on which the system's functionality will be implemented as applications software and perhaps additional dedicated hardware.

Compare and contrast

In contrast, the functional design process consists of defining the system's response to specific stimuli. This functionality can be captured using any number of representations, such as C/C++ models, math-oriented tools, state machines, data-flow diagrams, and HDL models.

But what happens when no higher-level modeling and simulation is performed? Invariably time is spent fixing problems that could have been seen and corrected with early system-level simulation. At times, when using detailed lower-level simulation, it may take many hours—or even days—of execution just to see the design problem and equivalent amounts of simulation time to retest the fix. Alternatively, the problems may not be seen until a full fabrication turn of silicon is available. At worst, the architecture or system design may have serious or even fatal flaws requiring a major redesign.

Additionally, ignoring architectural modeling results in a legacy of flawed architecture for a product family. The system problems may not have been serious enough in the first product to be noticed or cause concern, but later product versions are stuck with these initial architectural flaws for years down the road. This happens whenever an architecture can't be altered significantly due to constraints imposed by the customer's end-use of the product.

Those making decisions, however, may not accept an engineer's architectural concept. Therefore, the project may not get off the ground. Back-of-the-envelope estimations and hardware/software diagrams don't seem to convince others to move forward with these ideas. An architecture model that produces simulated results and can demonstrate the system in various manners provides a powerful tool for influencing these decisions.

Tools to the rescue

For electronic products, system designers already have an effective tool—VHDL with simulation. Some already build valuable behavioral models of their systems, often at a full functional level where the computations are performed or the of behavior of the system is actually modeled. For engineers focusing on abstract, higher-level modeling forms, VHDL also can be used.

One approach is based on transactions or tokens; little or no real computation is done in this model. Real data isn't passed through the system, but tokens (short messages) are used between modules. The tokens can reflect almost anything: data items, data blocks, synchronization signals, status information, instructions, or commands. The modules can be hardware or software. To be useful for designers, however, these facilities must be available as packaged utilities within the VHDL. (The University of Virginia has done some work in this area.)

Moving a step further, some tools incorporate token-based simulation and also provide other facilities for building, simulating, and analyzing architecture models. One such tool is eArchitect from Innoveda, Inc. (Marlboro, MA). With this tool, system architects don't need to write VHDL to produce a model; eArchitect provides the ability to quickly construct hardware models in a block-diagram format. The block diagram is constructed using elements from the eArchitect library. The functional elements are modeled only with respect to their throughput, response time, or latency—not their actual behavior. This abstraction allows for a huge improvement in simulation speeds, to such a degree that it's no longer a bottleneck in this phase of system design. Thus, the system architect can concentrate on architectural concepts, such as data flow, network topologies, and processor selection.

The tool also provides the ability to capture software at the architectural level. Software is organized as block diagrams representing tasks that may execute on any of the processors in a system design. Software details are captured as a textual description or flow chart, enabling the designer to capture basic software-control structures, data flow, and cycle-count budgets. Since the hardware and software are captured at a very high level, time-consuming and implementation-specific details don't complicate the issues at the architectural level.

QAM modem modeling

Originally, the purpose of building an architectural model was to demonstrate the viability of an overall architectural concept. The concept consists of using flexible co-processors (FCP) with a DSP. The applications that can be partitioned into computational pieces provide the target, in particular, where only limited data-transfer is required relative to the computational load. Ideally, these are cases in which the data transfers could be block-data transfers. Digital communication applications often fall into this category.

Figure 1 - The QAM modem
The operational stages of the QAM modem used for this case study are shown. The sample rate of the data starts at 20MHz and down-sampling occurs at two places in the data-flow.

As a means to analyze and demonstrate this approach from the system point-of-view, we chose to do a system-architecture simulation. A QAM modem example, simpler than our target application, became the focus of our modeling tasks. Previously, we had built these system models directly in software, C/C++, with only limited support utilities. This strategy presented limitations. The developer becomes the model's code guru, so it's difficult to pass the model on to others. There are no means (or only hijacked ones) for viewing the results, such as component usage-over-time and time-related interactions. There were no facilities for version management. We decided to use a supported tool approach instead by choosing eArchitect. Additionally, this effort was intended to help us understand architecture-level CAD tools in this form of system analysis.

Timing recovery, the first operation, includes an FIR part and a core algorithmic part for setting the filter coefficients (see Figure 1). Demodulation follows, consisting of quadrature multiplication (w/cos() & sin()) and pulse-shape FIR filtering. Adaptive equalization FIR filtering is next, coupled with the carrier recovery and the symbol-to-byte mapping block. The last three stages are: de-interleaving, Reed-Solomon decoding, and derandomizing.

The QAM modem implementation operates, for the most part, using block-data transfers between the DSP memory and the FCP accelerators. Usually these transfers are facilitated by the DMA so that the DSP is free to continue computation. Various synchronization signal lines exist from the co-processors to the DMA and DSP, and also between the DSP and DMA.

Each FCP contains a short command queue as well as buffer memory for data. The commands specify actions, such as synchronization, with respect to incoming or outgoing block-data transfers, and various types of computation depending on the particular co-processor. A typical command sequence might be one with three commands: receive data sync, compute FIR, send data sync. The co-processor would then expect an incoming data transfer and hold, if necessary, until completion. Next, it would perform the desired computation. Finally, it would send the completed synchronization signal. Often the data transfers can be overlapped with computations operating on a different data block.

Architecture exploration process

The architecture exploration process boils down to mapping the system behavior onto a possible architecture, evaluating it, and taking the results from the evaluation while making changes to the architecture, if needed. Some behavior will be mapped to dedicated hardware and some to processors. Communication is mapped to hardware buses or software constructs such as mailboxes.

Once the behavior has been mapped into the architecture, the performance can then be analyzed. To evaluate performance, the user must look at delays as the tokens move through the system. A behavior mapped to software has two different types of delays: its own, as it takes up clock cycles on the processor; and the delay caused by the overhead of the operating system or kernel. Similarly, hardware entities have associated delays. A bus system has delays associated with data flows and the complexities of handling various data sources. A simulation run on a system model provides a feel for the performance of a particular architecture, including how the system meets resource contention, throughput, utilization, reaction time, and latency requirements. At this stage, changes can be made to the architecture, and it can than be re-evaluated and compared with other models.

Modeling hardware/software

Within the eArchitect hardware framework, each primary hardware piece was modeled with generic processors from the model library. The DSP, DMA, and the two FCPS were the models used. This type of modeling allowed simple behavioral descriptions to be written and used for each processor. The DMA, though part of the TI TMS320C6201 DSP, performs its operations in parallel with the DSP core and was, therefore, modeled with a generic processor. Alternatively, we could have modeled the DMA with a HW state machine modeler. The bus structure connecting the DSP and DMA to the two co-processors was modeled with bus elements from the eArchitect hardware library. Various parameters are available through which these bus models can be adjusted to match the target hardware.

Figure 2 - Target architecture
The eArchitect model of the target architecture for implementing operations consists of a TI TMS320C6201 DSP with multichannel DMA, and FCP for general FIR filtering (multimac), and a Reed-Solomon decoder FCP. The mapping of operations to this architecture is shown in Figure 1.
In eArchitect, the generic processors use software procedures specified by the user. These are organized as task objects and multiple tasks can run on a processor—although only one is actively executing at any one point in time (see Figure 3). Each FCP has a single task to describe its system behavior. The DMA has three tasks—one for each DMA channel used. The DSP also has three tasks: one for system control, one for the algorithm computation, and one for auxiliary operations or idle time.

Several of the message/data queues emanate from the main DSP task, Task_C60, and carry commands or setup information to the FCPs or to the DMA tasks. Several of these also emanate from the Task_C60 or the DMA tasks and carry data to the FCPs. All of the message connections ending on the co-processors actually operate on the hardware bus. Each FCP also has two message connections back to one or more DMA tasks. One is for data; it operates on the bus. The other is for synchronization; it operates on a dedicated signal line. In addition, there are a few connections from the DMA and FCP tasks feeding to Task_C60. These represent interrupts to the DSP.

Note that we modeled only the connections between the tasks that were needed within the QAM modem model. The hardware allows for many more connection paths.

The DMA tasks all use essentially the same software description. These simple tasks process two types of commands from the DSP: one for sending data to a designated FCP, the other for reading data. When a block-data transfer is modeled, the task sends data through the associated message queue. The tool performs the arbitration through the bus model so that only one such transfer occurs on the bus at time. Synchronization mechanisms are included in the action of these commands. For instance, a DMA task can't receive data from an FCP until that FCP has triggered that DMA channel.

Similarly, the tasks for the Reed-Solomon decoder and the multimac are simple. Each processes the commands from the DSP for data-block transfers to and from its buffer memory, usually via the DMA. In addition, computational commands cause these tasks to remain active (but doing nothing) for the specified time, thus modeling the computation of the co-processor.

Figure 3 - Connections
The structure of tasks used for this QAM modem model and the message/data queues connecting the tasks together.

The algorithm task for the DSP likewise is simple. When told to do so, via Task_C60, the algorithm becomes active (doing nothing) for the specified length of time which models some piece of computational load. The system control task, Task_C60, contains the bulk of the behavioral description for operation of the QAM model. This task must proceed through a sequence of steps that direct the DMA, the two FCPs, and the other DSP tasks, to perform various necessary operations.

A general approach for implementing this procedure in Task_C60 provided a great deal of flexibility for developing and updating this control program. In addition, the mechanisms developed can be reused for later system models. We have already gained benefit from this aspect with the adaptation of this model for analysis of an entirely different application. This general approach was realized by writing Task_C60 such that it executes a sequence of instructions contained in a data array. These instructions, which represent the control program, can be easily edited and reloaded into the data array.

Thus, Task_C60 simply reads an instruction, performs the necessary actions, and then repeats this process. The instruction set has capabilities for sending all the defined commands to the DMA channels and to either FCP. Parameters can be set which are sent with these commands to reflect items such as length of data transfer and computation time required for the command. In addition, there are synchronization instructions for waiting on a particular interrupt, or checking a message queue for a completion message. A small number of variables are available for storing flags and numerical values. There are instructions for setting these variables and for performing simple arithmetic operations upon them, as well as for doing conditional branching.

Figure 4 - Tracking critical paths
An activity chart from a system simulation for the main activities in this model depicts when certain elements in the model are active (via the horizontal bars) and the bus activity. The critical path in this sequence of operations (depicted by the arrows between the activity bars) involve only the DSP and multimac. As indicated, compressing the path to increase throughput would prove difficult.

Simulation and analysis

Developing this control and sequencing program, which causes the modem to operate correctly and efficiently, presents no trivial task. Before any system analysis can be performed via model simulation, a first version of this program must be developed—in actuality, a design step at the system level in this architectural-analysis process. The first steps of this design involved deciding on an initial sequence of operations, including the necessary data transfer operations and synchronization points. Eventually, through trial and error, the correct functional operation was achieved. At that point, the analysis results were used to further improve this sequencing and scheduling program.

At the end of this initial development phase, a fairly good system design was achieved (see Figure 4). Three main computational elements dominate the activity: Task_Alg_C60, Multi_MAC, and Reed_Solomon. Task_C60 requires minimal time as it performs very little computation; it simply directs all the other activity. The total bus utilization is only 11.2 percent, as shown by the bus monitor element. Note that in this model the DMA channels only show activity at the trigger points for the data transfers, even though they would be active throughout the transfer in the actual hardware.

The DSP algorithm task realized a utilization of 87.8 percent; the multimac realized 88.7 percent. Required less frequently relative to its throughput, the Reed-Solomon only operates at 51.8 percent utilization. In terms of our architectural concept, these results demonstrated the validity of our algorithm partitioning and hardware mapping.

Loose change

In the modeling thus far, the demodulation FIR and equalization FIR steps are treated as one combined computation. This step is enabled after all necessary data has arrived in the multimac. This consists of two separate blocks of data: D1, the data for demodulation FIR that then flows to the EQ FIR; and D2, the data required to update coefficients in the adaptive equalization filter. This computation starts just after the D2 transfer finishes. The D1 transfer occurred a short time after the start of the previous S2B-CR operation in the DSP.

Thus, by changing the sequencing and synchronization points controlling the demod FIR and EQ FIR, improvement can be obtained. Specifically, after TR finishes in the multimac, the demod FIR portion is allowed to start. Immediately after it finishes, EQ FIR can start, provided the D2 transfer has occurred, which it should. At the end of these operations, the data is sent to the DSP as the D transfer. The S2B-CR operation can now start immediately after this transfer finishes. The result of these changes significantly closes the gaps in activity for the two tasks at hand. Final simulation shows 94.0 percent utilization for the DSP algorithm task and 96.8 percent for the multimac.

Clearly, architecture exploration is no longer a luxury—it's a necessity. Design cycles are shrinking dramatically, and there's more and more software content in today's electronic products. To keep up with these challenges, the architecture requires consideration early in the design cycle, which aids in selecting the optimal configuration. In addition, architecture exploration helps to meet performance requirements for a design and helps to minimize nasty surprises during integration—or in the marketplace. Most importantly, for high-volume designs, engineers can get the architecture just right without needing to pad the design with unnecessary and costly margins.


Dale E. Hocevar is a senior member of the technical staff at the DSPS Research & Development Center, Texas Instruments (Dallas, TX).

To voice an opinion on this or any other article in Integrated System Design , please e-mail your comments to mikem@isdmag.com.


Send electronic versions of press releases to news@isdmag.com
For more information about isdmag.com e-mail webmaster@isdmag.com
Comments on our editorial are welcome.
Copyright © 2000 Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About