San Jose, Calif. -- ARM Ltd. (Cambridge, UK) is broadening its product offerings beyond the well- established 16/32-bit embedded RISC processors with two new types of licensable core-based solutions: one is a multiprocessor core that specifically handles control tasks; the other is responsible for heavy data processing jobs.The two new core-based solutions were pitched to system designers at the recent Embedded Processor Forum, here.
Targeting the needs of applications executing multiple tasks at the same time such as with consumer entertainment and convergence devices in the home and car, ARM has begun licensing its MPCore synthesizable multiprocessor, which can be configured to contain between one and four processors delivering up to 2600 Dhrystone MIPS of performance. For applications that demand more performance than today's general purpose DSPs, ARM is offering its OptimoDE data engine technology.
"The MPCore multiprocessor solution delivers greater performance at lower frequencies than comparable single processor solutions, bringing significant cost savings to system designers. Multiprocessing is ideal for demanding applications executing multiple tasks at the same time such as consumer entertainment and convergence devices in the home and car," said Dave Steer, ARM's North America director of segment marketing. He cited a set-top-box recording several TV channels while sharing home movies across the Internet, and an in-car navigation system delivering simultaneous back-seat video gaming. The MPCore multiprocessor was developed as part of ARM's ongoing partnership with NEC Electronics. NEC will use the configurable ARM processor in high-performance, low-power products across the consumer electronics, automotive and mobile markets.
MPCore was described as a configurable block containing from one to four ARM-1136-derived CPU cores in a multiprocessing configuration. The cores are surrounded by the busses, critical control paths, memories and logic necessary to support either symmetric or asymmetric multiprocessing operation. The CPUs execute the ARMv6 instruction set.
The MPCore gives SoC designers, for the first time, a way to implement a whole multiprocessor block as practically a black box, according Steer. The cores and other blocks are supplied in synthesizable form, but by providing the whole thing in a package, with configuration switches to control user-exposed parameters like L1 cache size, ARM believes they have encapsulated most of the things that can go wrong in a multiprocessor design so that licensees don't have to deal with them. A clear idea of design requirements, sensible floorplanning, fast memory and use of ARM's synthesis directives should carry the day, Steer suggested.
MPCore supports up to four-way cache coherent symmetric multiprocessing , up to four-way asymmetric multiprocessing , or any combination of both. This flexibility provides increased throughput and system responsiveness, with full portability of existing applications and scalable performance for multithreaded applications. The ability to support multiple workloads addresses the needs of networking devices to process more packet streams and higher data throughput. MPCore supports both SMP and AMP software models, and supports a broad range of OS (operating system) and application software.
Steer said that there are at least two ways to use the multiprocessor block. One would be as an asymmetric multiprocessing system, to gather up tasks that had been running on other processors in previous designs, saving area overall. The other would be to move all or most of the tasks on the SoC into a symmetric multiprocessing (SMP) system. To this end, ARM has made available an SMP Linux, Gnu toolset and POSIX Thread Library. The MPCore multiprocessor supports the ARMv6 architecture, with SIMD media extensions for next-generation rich multimedia and convergent devices and ARM Jazelle Java acceleration. The MPCore multiprocessor implements between one and four processors with cache coherency using a modified MESI protocol. It also features configurable level 1 caches, 64-bit AMBA AXI interfaces, vector floating-point coprocessors and programmable interrupt distribution. The processor supports Adaptive Shutdown of unused processors to give dynamic power consumption as low as 0.57mW/MHz from a generic 130nm process excluding cache.
The ARM Intelligent Energy Manager technology can further reduce consumption by dynamically predicting the required performance and lowering the voltage and frequency. The MPCore multiprocessor enables system designers to view the core as a single "uniprocessor", simplifying development and reducing time-to-market.
The MPCore multiprocessor is available for licensing from ARM now. First silicon is expected Q2 2005. An evaluation system for the MPCore multiprocessor with Linux 2.6 OS and development tools is available today to enable early software development for MPCore multiprocessor designs.
Explaining the reasoning behind ARM's OptimoDE data engine technology, ARM program manager Matthew Byatt, said "There is a performance and computational gap between the signal processing throughput that current DSP chip designs can deliver, and what the newer applications actually need." According to Byatt, systems manufacturers are under pressure to accommodate the requirements of multiple data formats and communication standards within a single product, as well as anticipating the needs of future standards in the design platform. "The performance and computational gap cannot be closed by following current design practice. The universal requirements for low power and minimum cost suggest that traditional signal processing solutions will not scale to meet the processing demands of the new applications," he said.
OptimoDE is ARM's approach to embedded signal processing based on configurable intellectual property. OptimoDE data engine technology was acquired by ARM last year when it acquired part of Adelante Technologies. The OptimoDE technology is licensable intellectual property with an associated tool environment to be deployed alongside an ARM microprocessor core.
The OptimoDE architecture is fundamentally suited to exploiting the parallelism within data plane algorithms. The real value of a configurable approach is apparent in addressing divergent algorithmic requirements which, unlike the control plane, are more difficult to satisfy with a fixed architecture.
The data engine itself is composed of functional datapath units that can support efficient execution of the target algorithm. The datapath units include generic arithmetic and logical units, storage, and interconnect, as well as more specialized functions such as butterfly and DCT engines. Datapath units can be extended and customized by the users to meet special application-specific processing needs.
The VLIW architecture is available with a datapath functional resource library and several preconfigured microarchitectures with varying parallelism and performance. OptimoDE data engines are AMBA (Advanced Microcontroller-Bus Architecture)-compliant and are compatible with ARM's DSP interface specifications which descibe the interfaces between the cores for mail-box based command and control messaging and bulk data processing, debugging and trace interfaces and protocols for multicore debugging and software APIs for interprocessor communications. Optimode can be used as stand-alone processors or in designs with microprocessor cores.
Using configurable IP, designers will be able to create signal processing architectures that are matched to the needs of a specific application or algorithm. This approach, according to Byatt, enables the best of both worlds " excellent processing performance in a solution that is both flexible and meets the demanding power and cost constraints set by the new generation of algorithms.
For highly-optimized systems, these can be developed by semiconductor partners as reusable library elements for use within the data engine. OptimoDE data engines are suitable for a wide range of high-end algorithmic requirements across products in wireless, networking, printing and imaging, consumer and storage applications.
Data Engines enable the system designer to create an optimal solution based on the requirements of the algorithm (or class of algorithms), rather than the fixed capability of a pre-defined architecture. This approach provides far more design freedom, better hardware implementations and more efficient compiler code. An ARM data engine may operate as a stand-alone unit, or may be closely coupled with an ARM microprocessor core. Because OptimoDE data engines are highly configurable, they can be designed to exploit the full parallelism available within the target algorithm.
By supporting re-programmability at the software level, the OptimoDE design process enables designers to freeze the data engine architecture while continuing to tune the algorithm through C-code changes. This is an important feature enabling multiple algorithms, which have similar requirements, to be run on the same data engine hardware. For example, different variations of the MPEG algorithm could be accommodated on the same data engine hardware using different code. This approach enables the development of highly flexible, domain-specific data engines, which although programmable can still achieve high performance for the intended class of algorithms, often at lower clock rates.
An MPEG2 algorithm can be performed on a standard ARM1136JF-S microprocessor core running at around 650MHz. Implementing an OptimoDE data engine, adding an area overhead of just 1mm2 to the ARM11 core, enables the microprocessor clock rate to be reduced to 200MHz to match the MPEG2 performance on the core alone.
ARM provides a toolset that enables system designers to configure an architecture to suit the exact needs of many low-power, high-performance applications. OptimoDE delivers an optimal combination of low power, high performance and area efficiency. In addition, the solution retains the fundamental benefits of a software approach. The design process enables modifications and updates to made through firmware changes, and the proven methodology lowers development time, design resources and risk.
The OptimoDE tools also generate a C-compiler which is optimized for the architecture. Providing the compiler is essential to ensure that the efficiency of the development process extends to the software, and that both hardware and software are optimized together.
The design flow starts with a C/C++ specification for the algorithm. This drives the OptimoDE hardware configuration and software compilation tools. Using an appropriate data engine template, the system designer configures the architecture to meet the specified performance requirements.
The software code is then generated to match the architecture. Code can be re-generated to accommodate incremental design changes, or alternative algorithms, without altering the underlying hardware architecture. In addition, the tools provided with OptimoDE also automatically generate simulation models that can be used to assist in verifying the integration process, once the data engine is incorporated within a systemon-chip design .
ARM is now signing up licensees for the technology. However, ARM declined to give license fees for either technology.