Early system modeling
If the tasks are represented as algorithms in a programming language such as C, early system modeling can verify the functionality and measure the data transfers between tasks. At this stage, tasks have not been allocated to processors, and communications among tasks is still expressed abstractly, either through a message passing or a shared memory programming paradigm.
An early abstract system simulation model serves as the basis for sizing the computational demands of each task. This information is not exact, but can yield important insights into both computational and communications hot spots.
Using system simulation throughout the design process has two advantages: 1) an early start to simulation provides insight into bottlenecks, and 2) the model’s role as a performance predictor gradually evolves into a role as a verification test bench. To test a subsystem, a designer replaces the subsystem’s high-level model with a lower-level implementation model.
Assigning tasks to processors
The mapping of tasks to a SoC implementation raises some complex issues. Choosing to implement a specific task in a processor, in logic, or in software is a very important decision. There are two guidelines for mapping tasks to processors:
1. The processor must have sufficient computational capacity to handle the task.
2. Tasks with similar requirements should be allocated to the same processor as long as the processor has the computational capacity to accommodate all of the tasks.
The process of determining the right number of processors cannot be separated from the process of determining the right processor type and configuration. Traditionally, a real-time computation task is characterized with a “MIPS requirement” how many millions of execution cycles per second are required. Figure 4 shows a set of tasks with the initial rough estimate of the MIPS requirements for a 3G wireless SoC platform.
Figure 4- Baseline task performance requirements.
A control task needs substantially more cycles if it’s running on a simple DSP than a RISC processor. A numerical task usually needs more cycles running on a RISC CPU than a DSP. However, most designs contain no more than two different types of processors because mixing RISC processors and DSPs requires working with multiple software development tools.
Configurable processors can be modified to provide 10 to 50 times higher performance than general-purpose RISC processors. This often allows configurable processors to be used for tasks that previously were implemented in hardware using Verilog or VHDL. Figure 5 shows the acceleration possible with a configurable processor, reducing MIPS requirements. Staying with a single configurable processor family allows sharing the same software development tools for all the processors.
Figure 5- Task requirements after Processor configuration.
Application acceleration a common problem
Many standard, general-purpose 32-bit RISC processors aren’t fast enough to handle critical parts of some applications. The standard approach partitions the application between software running on a processor and a hardware accelerator block, but has serious limitations:
1. Methods for designing and verifying large hardware blocks are labor-intensive, error-prone and slow.
2. Requirements for the portion of the application running on the accelerator my change late in the design or after the SoC is built, mandating new silicon design.
3. Adaptation of the application software to hardware and verification of the combined hardware/software solution may be awkward and difficult.
4. Moving data back and forth among the processor, accelerator, and memory may slow total application throughput, offsetting much or all of the benefit derived from hardware acceleration.