Until recently SoC technology was considered the exclusive domain of high-tech multinational companies. That's because of the high entrance barriers: the need for diverse and complex expertise, the high costs of hardware and software development tools and the shipment volumes required to justify the cost of ASIC development. But until well-integrated, user-friendly and affordable hardware and software design tools become available, the broad-based market will not embrace technology for FPGA intellectual property.
To compile and assemble software code for reconfigurable IP cores requires a modular framework that facilitates the construction of C/C++ compilers for a wide range of architectures. When a compiler for a particular target is built, individual code generation and optimization phases have to be added, removed or replaced by a different implementation, depending on the characteristics of the core. For example, "software pipelining" is an important optimization feature in a compiler for VLIW-DSP architectures, but it is useless for most 8- and 16-bit microcontroller architectures.
Contemporary high-performance core designs typically use VLIW cores that can be reconfigured to support a number of changes. These include instruction set and availability of hardware accelerators; addressing modes and instruction-addressing-mode combinations; available number of registers; number and mix of functional units; pipeline configuration, and availability and size of instruction and data caches.
To support this level of configurability, these cores require compilers and assemblers whose behavior can be changed at run-time. Major architectural changes such as the introduction of new instructions, new addressing modes or a new pipeline architecture require the compiler to be rebuilt. However, subtractive changes such as the deletion of instructions, registers and addressing modes, and changes in number and mix of functional units, can be supported at run-time.
An SoC compiler framework that handles the matter of developing code and hardware in a configurable IP core context has to have several important characteristics. First, it should allow the developer to remove, modify or replace existing compilation phases without affecting other components of the compiler. Second, it must use a description-based approach for retargeting and optimizations, and offer plug-and-play architecture-specific optimizations. Third, it should support subtractive changes to the core's architecture at compiler run-time. Fourth, it needs to provide target-specific extensions to the ISO-C language and apply state-of-the-art local and application-wide optimizations. And fifth, it should use VLIW-specific instruction scheduling and single- and multiple-instruction, multiple-data optimiza-tion techniques when applicable.
Unlike traditional compilers that run code-generation phases sequentially, modern compilers use phase-order control to run a particular optimization at the appropriate time and to undo the optimization if it subsequently appears to have negatively influenced final code quality. At the same time, assemblers are generated from a processor's instruction-set database. In this context, support for reconfigurable cores is implemented by attributes that specify which instructions are supported by which variant of the core, enabling the assembler to adapt its behavior at run-time instead of at assembler-build time.
However, VLIW architectures tend to place restrictions on parallel instructions. Since no formal specification methods describe these restrictions, restriction checkers must be handcrafted. Modern assemblers must automate this feature. Fortunately, reconfigurabilty of a core's instruction set does not significantly affect the linker. For the linker, subtractive changes are fully transparent, but the introduction of new addressing modes may require the introduction of new relocation functions.
But an advanced SoC design can contain multiple disparate processing cores connected via system buses that also provide access to physical on- or off-chip memory devices and peripherals or both. And this can cause problems. Today's linkers can build applications for such environments; however, the process is tedious and error prone. Since most linkers are target specific, the application code for each core is independently linked and the linker cannot automatically locate the application without user intervention.
The effort to implement a target-independent linker depends on the object formats used. Tool sets for 8- and 16-bit microcontrollers typically use the target-independent IEEE-695 object format, while the 32-bit RISC-MCU and DSP markets favor the executable and linkable format, which is not target independent but supports superior provisions for passing C++ debug information. Since both formats are widely used by their respective market niches, a multicore linker designed for a heterogeneous environment must support fully both object formats.
For the linker to operate properly in an SoC architecture, the software elements must be described in terms not only of the cores within the system, but of each core's logical address spaces and the address translations between logical address spaces. It is also necessary to describe the mappings from logical address space to internal data and address buses, as well as all buses within the system and address translations between buses. It should describe the memory cache or caches and to which bus the caches are connected, as well as the physical memory and to which bus the memory is connected and the relative access times of each memory device
Given this information, the linker can link and locate the application software for multiple heterogeneous processing cores in one pass, eliminating the need for any user intervention. With annotations supplied by the compiler, the linker can optimize the entire application's memory layout, locate performance critical code and data in the fastest memory devices and optimize cache access.
This target architecture description is hard-coded in a linker for one core only. A linker designed to support heterogeneous multiple cores systems obtains information about the system architecture from the linker script file, which can be modified.
When EDA and software development tools are further integrated, this information can be automatically extracted form the system's hardware design HDL code and schematics.
It should come as no surprise that the software debugger is also affected by reconfigurabilty. Although supporting subtractive changes is trivial, supporting multiple disparate processing cores is a challenge technologically (synchronizing independent processors) and economically (a small number of developer seats use a specific combination of cores). For these reasons, one of the first-generation multicore debug architectures is based on the single-core debugger technology already available.
To manage the activities of multiple single-core debuggers, a middleware mechanism or mediator needs to be put in place. For each debugger, the mediator creates the illusion that that debugger has total control over the processor it is attached to something that was taken for granted when single core debuggers were designed. The mediator manages three tasks: execution environment initialization, launching debugger instances and defining and managing the debug topology the start/stop relations between processing cores.
The mediator initializes the execution environment whether that environment involves an SoC, discrete processors, simulation models or some combination of each. It then starts the software processes that initialize simulation engines or hardware versions of processing cores. Next, it sets up communication channels among these software processes, ending the only aspects of the mediator that are execution-environment specific.
After initializing the execution environment, the mediator displays a list of available processing cores and processes if the mediator is aware of real-time operating systems within the system and allows the user to attach a debugger to each processing core and optionally each RTOS process.
Defining and maintaining the debug topology is the most innovative and complex part of the mediator. The debug topology allows user-defined scenarios such as when Processor A starts execution, then Processor B starts execution as well or when Processor B stops execution due to a breakpoint then Processor C stops also, but Processor A continues to execute.
Although at first glance this approach is similar to other multicore debug solutions, the mediator debug topology management features are a key difference. This mediator allows the user to employ in a multicore environment all the debug features available in single-core environments while synchronizing between cores and debuggers. For example, every single core debugger can run breakpoint scripts and those scripts may restart the core and other cores within the debug topology.
The mediator's fundamental challenge is to return the "correct" exit status of processing cores to their associated debugger instances. The exit status defines whether the core is running or halted and why code breakpoint hit, single step completed, number of cycles consumed. The correct exit status is not necessarily equivalent to the exit status of the execution environment. In this implementation, the debugger that receives control after a core has halted tells the mediator what exist status should be reported to debuggers that are within the same topology as the core triggering the running-to-halted state change.