Heterogeneous multicore silicon (CPUs and DSPs) makes it possible to run demanding media-intensive applications in a power-efficient manner, as a CPU is more flexible and a DSP more power-efficient. The benefits of heterogeneous multicore, however, can be realized only if the software is adapted to leverage the characteristics of the multicore platform. Multicore programming goes beyond programming a couple of connected single-core systems. Some major points to consider:
- Single-threaded applications don't automatically run faster on multicore machines. C and C++, common in embedded systems, are essentially sequential languages. Therefore, when porting an application written in one of these languages to a multicore machine, the application may not benefit from the parallel capabilities of the platform. Since C and C++ don't offer any help in partitioning applications at the language level, system-level partitioning must be applied. Application/algorithm partitioning can be done at the task (system) level using an appropriate run-time platform (OS and inter-core communication) that support remapping of tasks between cores. Finer-grained partitioning (dividing tasks/algorithms) for optimal efficiency can be done with code partitioning tools that are designed for this purpose.
- Communication costs time and power. In an application partitioned across multiple cores, communications efficiency between the software modules or tasks is critical in terms of communications setup time, the amount of data that needs to be moved, the speed of the transfer and its delivery-time predictability.
The amount of data is influenced by the type of application and by how well the application is partitioned. The transfer's efficiency and predictability are influenced by the communication (software) infrastructure and the type of hardware interconnects available in the system. The hardware interconnects may provide limited flexibility, but the partitioning and the choice of the communication software are generally in the designer's hands.
Shared memory on a bus, a common design, is convenient, but the available bandwidth of the memory and/or bus has to be shared among the cores. If the memory is single-ported, the other cores will stall. Dedicated interconnects (on-core serial or parallel ports) might alleviate contention.
Synchronization of data transfers can be handled by a polling task, which is simple to implement, but usually, interrupt-based synchronization is more efficient.
An efficient and flexible software infrastructure for interprocessor communication can quickly and predictably move data, manage multiple logical connections and different types of hardware connections, synchronize transfers and provide an abstraction layer to the application.
- Debugging multicore systems is more complex than debugging single-processor systems and may influence the application.
If the entire system is stopped, the system state easily can be inspected, but if one core or a subset of them are stopped, it is more complex since other cores may be sending or receiving data to or from the stopped cores(s). Some cores stop the peripherals together with the core, allowing debug of the intercore communication as well. A multicore debugger that permits you to control which parts of the system are stopped for debugging and what happens to data in transit is a must.
Most embedded systems are influenced by external events and have time dependencies. They will not work correctly if a core halts for debug. Long timeouts can prevent the parts of the application still running from generating errors, if one core is stopped.
Peter Leyssens (firstname.lastname@example.org), a product engineer at PolyCore Software (Gent, Belgium)