United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

The Embedded Microprocessor Report

By Tets Maniwa


In this special section, Integrated System Design Magazine looks at the issues facing the user of computing platforms embedded within an IC. As more designs become deeply embedded systems, with processor, memory, and peripherals integrated into a single chip, selecting the correct processor becomes very important to the success of the design. Design styles and philosophies for system-level ICs are very different from those of large ASICs, emphasizing interactions and interfaces rather than the design of a compute platform. Some EDA tools recognize this change in design philosophies and, instead of dealing with gates or implementation issues, look at performance and functionality at the architectural level.

Most high-level design styles don't even acknowledge CPU characteristics, except for the overall throughput of the processor. Among present design styles, the processor is selected after the functionality is already determined and verified, instead of being the starting point for the design. As just described, fundamental changes in design philosophy can result in some significant design penalties if the available processor(s) don't completely fit the requirements. Selecting a processor after functionality is established makes the design both easier and harder. Easier because of greater design freedom due to the non-existence of processor-related internal structures like buses, and more difficult because nothing is bus or processor specific.

Not a Cuisinart

The first problem when evaluating the appropriate system-level functions for your microprocessor is to define the "best" processor architecture. For purposes of definition and discussion, the primary categories for the processors are based on instruction sets and architectures. The basic categories are fixed, variable, reconfigurable, and as a superset of some of the other categories, platforms. Each of the different types of processor has characteristics that make it well suited for some functions and less suited for other functions.

The software development environment may be as critical as the bus architecture or the overall throughput in the processor.

Processor selection includes important decisions for electrical and physical specifications like speed, pin-out, and bus structures. Also important, software development environment and the ability to get the drivers and applications programs into an acceptable memory footprint. When the embedded processor is going into a system on a chip (SOC), the available total die area limits the available memory. Up to half of the die can be memories, but the maximum memory may still be fairly small. If the memory is external, then the programs must fit into the available memory granularity.

One important consideration is the capability to interface with other processors. Many applications need more processing power than one microprocessor can provide. One alternative to a higher performance processor (this may consume excessive power) is to run more than one processor. The ability to operate in a multiprocessor mode can offer benefits in performance, and the redundancy may be necessary for some applications. Many of the new processors are small enough that an additional processor will not greatly effect the die area when considering more than one processor in an SOC. Because packaging and drivers don't limit on-chip memory, a data word in excess of 128 bits is feasible. The concept of interleaving processors in a very wide memory word environment offers interesting possibilities for simultaneous improvements in throughput and reduction in power consumption.

As a tangential topic, the inclusion of DSP functions is another facet of the debate. As DSP-type functions become integrated into systems, the need for additional processing capabilities becomes apparent. If the DSP workload isn't too high, the main control processor can be modified to perform some of the functions, eliminating the need for a separate DSP. In some sense, performance can scale in multiples of the uniprocessor setup by just adding additional processors. The silicon penalty for more than one processor may be less than the combination of a control processor, DSP, and interface logic. The value of the combination is that the processor presents a single programming environment and eliminates the need to interface multiple clock domains.

Even having multiples of the same processor is easier to design and program than have two or more types of devices in the system. The problem is that a general-purpose processor isn't optimized for DSP functions, and even two or more processors may not have the necessary throughput for the application. At some point, the silicon overhead for multiple processors may be much more than the area required for a dedicated DSP. The distinction may be a function of the amount of control code versus the amount of DSP code in the system. Very high DSP performance requirements or a large amount of DSP code can preclude the multiple-control processor approach.

Cast in silicon

Fixed processors have set architectures and instructions and include such well-known processor families as the ARM, MIPS, and some Motorola processors. These processor types address the range of performance requirements by having many family members-fast, slow, low power, etc. The favorable attributes for this category include well-known hardware and software interfaces, robust and widely available software development environments, and bus structures and peripheral blocks that allow straightforward incorporation of other functions. In addition, because the fixed processors have fairly large followings in the design community, many people are experts in designing the processors into systems and a large body of optimized and debugged software is available from the vendors and from other sources.

The fixed processors may come as hard intellectual property (IP) only. In some cases, the design is only available as encrypted models and a layout. Since the design isn't in a mode to be changed, this representation may not be a problem. The potential problem with this format, however, is the possibility that the abutment rules and inability to route through the block may make the final layout very difficult. The challenge of incorporating a hard core into an SOC is one of the more difficult tasks in the final implementation, further exacerbated by the high probability that the design rules for the core don't match the other library design rules. Also, the hard core may require extensive modification of the layout itself to meet the design rules of the intended process.

On the other hand, if the fixed processor is a netlist and a set of layout rules or floorplan, the inclusion into the SOC is much easier, but then the question of timing closure comes up. If the microprocessor has critical timing paths, the constraints on those paths make other parts of the system more difficult to complete. The soft core may be written for a different set of design standards in such areas as port names and bit order, making the instantiation challenging.

Other disadvantages of fixed processors include the inflexibility of their architecture and instruction sets.

If the basic processor doesn't have the correct functions and peripherals, the silicon for the unnecessary functions and the die area necessary to make additional functions possible in the processor may be excessive-especially for cheaper applications. The fixed instruction sets create other problems. In general, the embedded systems need to have the software fit into a fairly limited memory space. A hand-optimized assembly routine may still be too big or too slow for the system requirements when the code is restricted to a fixed instruction set that requires many subroutines to perform the primary tasks.

Shake and bake

Variable processors are HDL entities and therefore can be changed at the design stages. Representative examples of variable processor vendors are ARC, Infineon, Motorola, and Tensilica. Because the processors are HDL entities, the entire design can be modified to meet special requirements, such as remove unused functions, or add in special purpose features. The ability to add or subtract functions, structures, and instructions enables the processor to be adjusted to the application. The new derivative processor can be optimized for key constraints, such that special instructions can minimize code size and improve throughput by a significant amount. Since the core is usually an RTL source, the processor just becomes another block in the design, and doesn't create new layout problems.

Even though the cores are synthesizable, however, the ability to change the core to meet new requirements is limited by the amount of design time for the modifications. The problem with giving engineers the ability to change something is that they will change it. Any changes to the main functionality of the core may cause some special conditions to interfere with normal operations. In addition, if the design style is different from the rest of the hierarchy, the matching of ports and signals may be cumbersome.

Another problem with the variable architectures is that the design must be completely verified if any of the internal structures is changed, to ensure that the processor is still operational. When you change instructions or architecture, the original testbench and verification suite will try to exercise the altered components and will ignore the new functions, causing false negative errors. This verification of overall functionality and performance could take as long or longer than a new design.

Even though the vendors of variable processors note that they generate a complete tool set for software development, the tools aren't usually the normal ones that the programmers have at their disposal. In addition, some of the software productivity utilities like linters and code coverage tools can't check for non-existent codes or missing hardware. The changes in the opcodes may create problems with the software tools and middleware-the drivers for the internal building blocks-so the hardware-software integration is much more difficult. The stubbed-out code may not have any physical counterpart for the program to operate.

On the fly

Reconfigurable processors are also variable, but can be changed during operation. Imagine a processor that boots in a self-test mode, then changes internal functions to be a math-intensive computer for tasks like simulation, then changes again into a graphics engine to do the animated waveform viewing. No processor can do this yet, but the capability will come in time. Although the reconfigurable processors have been research topics for many years, the technologies to implement them are just becoming available now. The greatest difficulty to date hasn't been the hardware, but the combination of software and hardware. The problem of different register sets and instructions means that data and state information may not have a place to go when the processor reconfigures.

Reconfigurable processors provide the benefits of minimal area and maximum speed for a set number of functions. These processors can be optimized to perform functions in a manner similar to an intelligent FPGA. The programmed functions can be optimized for the specific attributes needed for a function, and then an overlay can overwrite the hardware and implements other hardware as needed. The difficulty is in verifying the multiple operating functions. Just like the variable processors, design verification is key, but with the added complication that the functions may not exist in some configurations and the changes to other operating modes may preclude complete testing. The state of the hardware may depend not only on a set of instructions, but may also depend upon the state of another state machine to implement a different operation. The software will need to have extra layers and the memory will need to reserve additional addresses to hold functional changes and intermediate states for the transitions.

More than scaffolding

In addition to the wide selection of processors, a number of vendors are starting to propose another model, that of platform-based design. Here a base processor and a standard set of peripherals and memory are the starting point for the design. For example, a vendor might configure a base configuration of a microprocessor, a DSP, memory, and the nominal peripherals-PCI core, memory management, DMA, etc.-to use as a starting point in the design. This platform then gets additional peripherals, function enhancements, and special purpose hardware to customize the design to a particular company's requirements. The platform-based design overcomes the major hurdles of interfacing the major function blocks.

Platforms can come in any number of flavors. The main processor can be fixed or variable, and adding extensions for special functions may eliminate the coprocessor. The mix of basic peripherals may change and many other variables will become available. The platform-based designs solve the initial problems of designing and verifying the basic structural functionality and provide the basis for differentiation through special purpose peripherals and through software. Because the primary elements are already integrated and sometimes available as functional silicon, the platforms allow the software teams to start the software integration process much earlier in the design cycle. This more parallel route to the final product should reduce the total time to market and gives the software teams a hard platform for early code testing.

The greatest challenges for the platform-based processor is in the overall characteristics of the platform itself. Here, all of the issues of system-level integration come up. The technical details of the processor architecture(s), bus structures, and available peripherals, must fit into the available budget of speed, performance, throughput, area, and power from the silicon implementation perspective. A platform may only exist at one vendor, which may restrict the availability of other function-specific peripherals. If the design style is vastly different from other function blocks-for example, port naming conventions, significant bit designation and positioning-the bridging functions may take more area and power than allowed in the system budget. If the basic mix of functions isn't fully compatible with the design intent, the extraneous functions may create an area, power, and cost penalty for the system. The extra functions will also tie up additional I/Os, which will force a larger pad ring and possible empty space on the die.

In addition, the software development team may not like or use the development tools available with a given platform. Because the platform selection crosses development team boundaries, the final decisions on platforms have to go up at least one level in the corporate structure, making decisions slower and harder to get. Other departments may offer their not fully technical comments and objections to a particular platform, such as marketing having to answer the question "to be, or not to be" (fill in the blank for characteristics). Another marketing objection is that a platform-based design has the same starting point for everyone, so the company loses its time to market advantage and can't make a unique product, just another "me-too" evolutionary box. Manufacturing may not want to use a particular platform because the vendor isn't on the approved vendor list, or the vendor's process is incompatible with other needs in the final product, such as not supporting a DRAM process module or bumped flip-chip processing.

Finally

As you can see, the important attributes for embedded microprocessors are highly dependent on many factors. No single architecture is optimal for all applications, and in fact, the best selection in one category might be the least suited, overall, for a specific, system-level functionality. One important characteristic, not mentioned above because it isn't very quantifiable, is the aspect of vendor compatibility, not only from the vendor to you, but also with the other vendors of IP blocks. If you aren't comfortable with the vendor, the technical issues may not matter, because the truly critical questions will never get asked. Any vendor who displays arrogance and faultfinding will be inimical to a productive relationship. However, vendors who just exhibit friendliness don't help to get the IC taped out.

The balance of this section is comprised of the technical and economic justifications for choosing from the range of available implementations. The vendors of the various processor types will try to address the multiple issues and will make a case for the situations where their offering(s) are most applicable. Be aware of the usual caveats: free advice is worth exactly what you pay for it and that vendors' statements need to be read critically if not skeptically.

Sidebar 1: Embedded Processors for PLDs

by Martin Won

With the recent availability of embedded RISC processors for programmable logic devices (PLDs), embedded designers now have the opportunity to easily explore the trade-offs between implementing functions in software versus implementing them in hardware via co-processors or custom peripherals. Building reconfigurable systems is also much easier when using embedded processors in programmable logic-you receive the flexibility of reconfigurable processors without the associated risks.

With the integration of embedded processors within the hardware, a designer can do several iterations of a system within a fraction of the time required to implement either a custom ASIC, or a board-level design that relies on an ASSP. This flexibility enables reduced time-to-market for a product, but also allows the designer to explore different partitioning options to deliver the best possible combination for that product.

Embedded processor PLDs give system designers unprecedented freedom in determining which functions should be executed in software and which would benefit most from dedicated hardware implementation in the form of custom peripherals or co-processor elements.

For example, math functions like digital signal processor (DSP) operations might be performed in a processor, a co-processor or a dedicated peripheral. If the design prototyping platform contains all of these elements, the software team can modify the code to pass the operation to the co-processor or call the peripheral instead of relying on the main processor. Because the designer has the ability to customize when using programmable logic, each version of the hardware design can be realized on the order of hours, and each hardware/software combination can thereby be evaluated quickly.

Building reconfigurable systems

In an attempt to meet the demands for performance across a wide array of operations, many developers are pursuing reconfigurable processors as an option. These reconfigurable processors strive for RISC-like performance for a set of instructions that is broader than is offered by RISC processors. The developers of reconfigurable processors hope to achieve this performance by utilizing a RISC processor that can be altered to run with multiple instruction sets or operations. This approach faces several challenges, including the transfer of processor data between instruction set "shifts" and the contents of the processor boot ROM, which expects to see a specific processor architecture. Changing the processor and/or the boot ROM in the field invites the kind of instability from which the system may not be able to recover.

The functionality of a reconfigurable processor can be achieved using a reconfigurable system that includes both the processor and its peripherals.

Reconfigurable systems can be built with programmable logic, enabled by embedded processors (both soft and hard cores). By changing the PLD configuration to include a peripheral set that is optimized to perform the operations of interest, the system can provide hardware-accelerated speeds for those operations without the associated risks of changing the processor itself.

The presence of an embedded processor in a PLD enables the creation of a "user-browsable," configurable system. This system can be examined and reconfigured by a remote operator via the Internet (or other communications medium) using something as simple as a Web browser. The advantages of this approach include easy in-field modifications, real-time hardware bug analyses and fixes, product differentiation through software-like upgrades, and extension of product lifetime via feature enhancement and refinement.

Using an embedded processor and programmable logic, a system can be built that allows remote user examination and reconfiguration via common communications media such as the Internet.

The reconfiguration portion of the system consists of a processor-capable PLD, a configuration controller (another PLD or processor) and external memory. The embedded processor PLD runs a real-time operating system (RTOS) with a TCP/IP stack, and the PLD is connected to a network via a media access controller (for example, an Ethernet MAC). The external memory houses the RTOS, software for the processor, and multiple programming files for the embedded processor PLD corresponding to different PLD configurations (and different peripheral configurations). The software generates a small HTML file that describes the current state of the peripherals and the external memory. The user may browse this file at any time using a normal web browser and choose a different configuration. If the programming file corresponding to the new configuration isn't resident in the system's memory, it can be sent to the system via the communication medium and stored in the external memory.

After a different configuration is chosen, the software indicates to the configuration controller where to find the correct programming file in the external memory.

The configuration controller then initiates reconfiguration of the embedded processor PLD using the new configuration file. If the reconfiguration doesn't succeed, the configuration controller reconfigures the processor-enabled PLD into a default state and (if desired) sets a reconfiguration error flag. Additional benefits of this system include the capability for the processor to reconfigure the configuration controller (if it is a PLD), or reprogram the content of the external memory with input from the remote operator.

Nearly every element of platform is user-configurable, providing a new level of flexibility in hardware accelerated systems.

Martin Won is a senior member of the technical staff at Altera Corp. (San Jose, CA).

Sidebar 2: The Historial Perspective

by Jim Turley

There are an awful lot of microprocessors available, and most of them have been around for a long time. Motorola's venerable 68K family, the Paleolithic x86 architecture, and many 8- and 16-bit CPUs have a large and devoted following. Yet what all these processors have in common is a shared ancestry-they were all developed for personal computers or workstations that no longer exist. Almost without exception, today's processors and processor cores were developed 10-15 years ago for very different conditions. They are used in embedded systems by default, not by design.

Today's microprocessors are surprisingly difficult to integrate into an ASIC or SOC, for the obvious reason that they were never designed to do so. Most processor cores are little more than packaged chips without the package. Fixed bus interfaces designed for PCs, fixed instruction sets designed for running Unix, and fixed silicon layouts designed for mass production on a single fab all conspire to trip up even the best design teams. The legacy of 1970s-era thinking and architecture are still with us, and we're only just now starting to see processors that were actually designed recently, with ASIC and SOC integration in mind.

The first wave of processor "cores" were sold (licensed, technically) as fixed, hard silicon layouts, and most MIPS, ARM, SPARC, and PowerPC cores are still used this way. Far from buying time to market, such hard cores are brutally difficult to use. For example, SOC designers must design a rectangular hole into their chip that exactly matches the footprint of the core. Pin-outs, bus protocols, electrical characteristics, timing-all these parameters must be adhered to, because the hard core can't and will not change to suit the designer. Rather, the designer, and the design, must change to suit the core. If a processor core is (on average) only 10 percent of the overall area of the chip, why should it cause 90 percent of the problems?

The second wave of processor cores was synthesizable: the so-called "soft" cores. Soft cores solve the rectangular-hole problem, because the physical outline of the core can change, but they do nothing to alleviate the other headaches. Bus timing, protocols, instructions, and interfaces are still fixed. Once again, the SOC designer is called upon to alter his vision of SOC nirvana to suit the demands of a 10-year-old CPU architecture someone designed for a totally unrelated system.

The third wave-just now washing on shore-is that of configurable processor cores. "Configurable" in this sense is completely different from merely synthesizable. Configurable cores hand the SOC designer far more freedom to make intelligent choices about the characteristics of the core. Synthesizable cores merely change shape; configurable cores change feature sets, interfaces, instructions, and protocols.

Weighing the advantages

Configurable processor cores have two major advantages, and several minor ones. First, they solve the old problems of fixed bus interfaces and timings.

Rather than stick to a static data book specification, configurable cores are adjustable to suit the timing and needs of the rest of the chip, not the other way around. Second, configurable processors allow SOC designers to add new features, differentiating their product at a fundamental level. Such enhancements can take the form of specialized instructions, unique encryption features, hardware accelerations, broader data paths, or just about anything else the SOC designers could wish for. In a very real sense, a configurable core is collaboration between the original processor designer (the IP company) and the processor user (the SOC designer). The final core embodies the skills and experience of both teams.

Some of the minor advantages: freedom to fabricate the chip on any foundry in the world (Virtually all hard and soft cores, in contrast, must be built by a short list of licensed foundries); control over the evolution of the processor (as opposed to traditional CPU companies, which dictate product revisions); the ability to remove unwanted or unnecessary features as well as to add new ones; and the ability to maintain the secrecy of, and even patent, custom enhancements. Configurable cores are the first to give the customer credit for knowing his own application.

Paradoxically, core configurability shortens design time, not lengthens it. The point of configurable processors isn't to tinker wantonly with the core for the sake of it, but to alter, adjust, and enhance the core in useful and productive ways. Changing a bus interface to suit the rest of the chip, or changing instructions to suit the software, is vastly more productive than struggling with fixed interfaces and programmers' models. Glue logic adds no value. At last, engineers can have the core they want, not the one they were given.

Also, to promote productivity rather than sink it, adherence to existing standards is key. Standard HDL languages (Verilog and/or VHDL), synthesis tools (Synopsys, et al.), and verification tools (ModelTech, LeapFrog) means that SOC design teams use their familiar tools, not learn proprietary new ones.

Anything else squanders the time-to-market advantage on a steep learning curve. The entire processor not just the customer's enhancements-should be available in source code, ensuring total visibility and testability of all facets of the design. Late nights of debugging will be less harrowing knowing that bugs aren't lurking inside some black box that can't be seen, tested, or modified. Like anything powerful, there's a potential for misuse if not handled properly. Dumping the RTL source code on a neophyte SOC design team is a recipe for disaster. It's better to provide a guided, front-end tool that offers point-and-click options for adding (or removing) features, registers, debug options, cache modifications, peripherals. This front-end tool then generates the HDL (in Verilog or VHDL), guaranteeing a known-good core. It goes without saying that the software tools (C compiler, assembler, debugger, profiler, linker) should be generated at the same time, or the processor wouldn't be programmable. Here again, professional software development tools, as opposed to in-house shareware hacks, aid productivity rather than compromise it.

In the end, configurable cores give the SOC design team credit for knowing its own business. Rather than the arrogant, hands-off approach of virtually all processor and DSP vendors today, configurable core vendors encourage SOC design teams to make their own decisions and to do what they do best: design products.

Jim Turley is currently vice president of marketing for ARC Cores (Elstree, England).

Sidebar 3: Making the Right Choice

by Steve Evans

Selecting a microprocessor by application makes sense, but the trend for designing today's applications is broader in scope; SOC and design reuse mean that a processor is no longer just for a single function or a single platform. Increasingly, the better choice is to select a more general-purpose microprocessor that provides a standard feature set within a stable, backward- and forward-compatible architecture that has been proven in millions of shipped products. And the decision process shouldn't focus on just the processor; the availability of tools, reuseable software that provides additional functionality, development platforms for prototyping, ease of SOC integration, and the ability to insert and connect IP from a variety of sources should all be factored into the decision.

ARM microprocessor cores, based on an evolving Instruction Set Architecture (ISA), have a good ratio of power to cost to performance to meet the needs of the entire embedded developer community. In addition to engineering re-usable microprocessor cores, ARM has created views that enable the use of industry-standard EDA tools, a standard bus infrastructure for on-chip interconnect, and reusable peripherals for hardware development. For software development compilers, debuggers, and real-time trace capability, as well, as reusable application modules and optimized real-time operating systems are available from most commercial RTOS vendors.

The evolution of the ARM ISA enables core variants that meet specific application needs; for example, the ARM Thumb 16-bit op-code ISA was added to meet the need for tighter code density without breaking compatibility with the original ARM ISA. When the requirement for improvements to DSP capability for digital servo and audio codecs was identified, the ARM9E core was developed which included some specific extensions for improved signal processing capability, again while maintaining compatibility. This compatibility enables developers to reuse their software and third-party software vendors to swiftly support evolutions of the ARM architecture.

To extend the capability of the microprocessor, ARM also provides a co-processor interface that can be used to specify instructions to the system. As the co-processor interface works in parallel with the ARM core itself, the ISA compatibility isn't broken, ensuring that software can still be reused effectively. This also allows a standard processor core validation suite to be used without modification (an important point given how difficult it is to validate a microprocessor core that may be used with any combination of user software in the end system).

Semantics and systems

By definition, embedded and system-on-a-chip designs must involve systems engineers from the specification through to debugging the final device. This long-term commitment brings with it a requirement for higher levels of abstraction in simulation, such that sufficient lines of code can be executed during the specification phase to give confidence in the decisions taken at the architectural level before the cycle of HDL coding and simulation begins. Through its long history of working with EDA design flows, ARM has had a role in providing models at different levels of abstraction that successfully support the requirements of different development phases. This means providing high performance instruction set simulators with interfaces to co-design and co-verification environments, models for use with hardware accelerators, and cycle accurate back-annotable models for gate-level sign-off.

SOC development is also more complex-due to the need to connect modules that may have different origins. For example, a SOC design for a digital audio player may have a microprocessor core and peripherals from ARM, peripherals from a third party (for example, a USB block), and modules developed by the customer. Without a standard on-chip bus, the interconnect task would be time-consuming and error-prone, as well as difficult to perform in a way that promotes reuse. ARM solves this problem for its partners and customers by providing, on a royalty-free basis, AMBA. The AMBA bus was designed specifically to support debug of embedded processors, and can support multi-processor designs in several configurations. To meet the needs of high bandwidth systems, the AMBA bus can be implemented with data bus widths of up to 128-bits.

Integrated microprocessor cores are platforms for the execution of software. Developing software for an embedded system is already a complex and resource consuming task, and software development consumes a rapidly increasing share of embedded system development budgets. So the correct choice of core can make a software team more productive and is going to have a significant effect on the overall project schedule and budget.

The microprocessor core must be supported by robust development tools, as well as software modules that can be used to quickly implement well-defined functions in a SOC design. When considering code development tools, it's important to think about the complete development flow. Code must be specified, written, turned into machine code (compiled/assembled, linked, located), and then debugged. There is also often a need at the beginning of the development to benchmark critical code to get the best idea of the potential system performance. Meeting these requirements in a deeply embedded microprocessor is non-trivial, and if (as is so often the case) the design is used in a system with strong real-time elements the level of difficulty increases significantly.

ARM has had success in this area including code generation tools. Developers who have struggled with proprietary architectures, and as a result have struggled with limited compiler support, know that the quality of the compiler can make or break a project schedule.

The final form factor

At the debug level, ARM has invested in the development of the Realview family of real-time debug tools that provides code execution and data access visibility for deeply embedded SOC, supporting execution speeds in excess of 200 MHz. This enables the real time developer to debug a system, even in the final form factor. Realview consists of modular hardware and software so that it can be integrated as part of a complex SOC and into third-party tool chains via open interfaces. The developer can take advantage of the multiple vendors supporting the ARM architecture, while still having access to leading edge real-time debug capability.

The choice of a microprocessor core for integration into an embedded or SOC design is a complex issue.

It involves not only the strengths of the core itself, but also the total environment and support structure that will enable the development team to integrate it into the final design within the required timescale. Through internal development and a global technology network, ARM has invested in this total environment to provide a solution that meets the emerging trends in system design, no matter the fab, design flow, operating system, choice of tools, use of IP, platform, process geometry, or end application.

Steve Evans is currently with ARM (Cambridge, England).

Sidebar 4: The Debate Continues

by Robert Ober

There is little doubt about the complexity and trade-offs involved in selecting CPU architecture for deeply embedded system designs. My contribution will look at just six of the many issues involved in selecting an embedded processor. These areas are the requirement for a systems level view of design; the role of multiprocessing; balancing signal processing and controller functionality; fixed vs. configurable processors; the platform approach to system design; and the role of business relationships.

A common element for all successful architecture and design teams is the prevalence of a system-level perspective. But system-level concerns and a focus on the practical requirements of high integration are usually not a priority in a "pure" CPU design exercise. Many designs simply end at the address interface, presenting system developers with challenges they should not have to face. To overcome this, the core must have at least an abstract definition of how it interacts with the outside world, including bus architectures, peripherals, interrupts, and memory, and the relationships between each of these elements.

Looking at only the electrical and physical specifications of a processor, such as clock speeds, pin-out and bus structures, isn't sufficient. Performance, rather than frequency, is the more important measure, since different 32-bit architectures can show 2:1 ratios or greater in measured performance. A rich, well-understood software development environment is critical, as is the ability to support the desired applications in a small memory footprint (again greater than 2:1 ratios are common). Keeping in mind the system-level concerns that are essential to successful design, it's interesting to look at these development environment and memory issues in the context of multiprocessing implementations.

Emerging solution

Since a "faster is better" approach isn't always suited to embedded designs, single-chip multiprocessing systems are emerging as a preferred route to achieving high performance. A solution with two or more identical processors on a single die can operate at relatively low frequencies, which helps to mitigate EMI and power consumption concerns. Lower operating frequencies also simplify matching of memory and processor clock speeds for better intrinsic performance. However, few core processors currently available can be "tiled" into a multiprocessing system. The architecture must be defined from the start with memory mapping and standard bus interfaces, as well as strategies to support system resets and power management across multiple cores. Development tools that recognize the memory map and interrupt structure, and can support migration of system code to a multiprocessor configuration, are also essential.

The Tricore Unified Processor architecture developed by Infineon was conceived as the type of single, scalable CPU needed to support execution of tasks across one or more cores. Additionally, the inclusion of DSP functions in the instruction set architecture is a central, not tangential, element in the core architecture. The instruction set incorporates the types of multiply accumulate (MAC), loop, address modes, and index instructions, as well as Q and saturated data types once found only in a dedicated DSP processor.

Benchmarks published by the leading independent DSP test lab verify that performance on common DSP functions is comparable to currently available "pure" DSP cores, but with all the facilities of a RISC core.

As a result, applications that may once have required separate control and DSP cores are now being implemented using a single tool set and common debug environment. In complex applications, such as a third generation wireless phone, multiprocessing configurations present the ability to scale the performance of the system. Additionally, high DSP performance can be achieved without paying the system memory penalty (three physically large memories, usually dual port) typically associated with a dedicated DSP core.

Yet, system designers will always be able to make a case for including specialized instructions to implement key features or improvements to achieve differentiation. This leads to the issue of fixed versus configurable cores. A major issue when evaluating configurable CPUs is that they typically place the system designer in the role of processor architect. Working with a soft instantiation of the core architecture provides the design team with a good deal of flexibility, but also makes system designers responsible for the area, speed, routing, and a host of other CPU design issues. A basic question here is whether this is really a job that the system team should take on. Processor architecture is, in my view, inherently fun, but optimizing the CPU may not be the task where a company should invest finite resources.

Change for the better

The improvement sought by the design team is often achieved with a relatively small addition to the instruction set. A recent design of a single-chip cellular phone controller illustrates a balanced approach to the issue of fixed versus configurable cores. The implementation of an efficient Viterbi algorithm required a small set of specialized instructions. The fixed core used in the design included a standardized co-processor interface, and it was a relatively small task to add and verify the instructions, and then optimize them within the system design.

The ability to quickly configure a hard core by adding specialized instructions allows the provider of processor cores to begin building an efficient platform strategy. The idea of providing a scaffold on which to build systems is attractive. However, the cautions raised in the introduction to this section are important. A standard platform will almost certainly include a mix of functions that isn't precisely matched to customer requirements. The unused functions represent power, area, and associated cost penalties in the system design.

Conversely, the ability to produce a relatively standard development chip that is seeded to software development teams can accelerate system development.

Developers should look to suppliers of embedded processor cores to provide full-function controllers, and the associated development tools, to enable hardware/software co-design. The evaluation platform enables the cost benefit trade-off of design decisions, to determine which functions are best integrated into the final system chip. To speed final integration, the availability of peripherals as "standard" IP blocks should be explored (for example, USB or Ethernet MAC). Combined with the ability to add specialized instructions to the core, this may be the ideal way to balance the trade-off between a fixed platform and a custom design.

Finally, there are questions of any particular vendor's ability to meet the development and production requirements for a given embedded design program. Just as no single architecture is optimal for all applications, the relationship between supplier and customer is a variable that must be considered for each design.

Recognizing that there is no single right answer to CPU selection questions, Infineon Technologies supports a portfolio of embedded cores, and maintains a business organization tasked with managing and supporting system designs based on this portfolio.

The technical and business development teams that specialize in each of the different CPU architectures at times both compete and cooperate in the development of optimal solutions for the company's customers.

The common characteristic of the teams within this Cores and Modules organization is the system-level perspective on design, which is a legacy of the company's evolution as a supplier of system-level solutions.

Robert Ober director of architecture for Infineon Technologies, Corp. (Munich, Germany).

Sidebar 5: Moving Past Embedded Processor Cores

by Charlie Cheng

For many ASIC designs and microprocessor components, advancement of semiconductor process technology according to Moore's law (which states that capacity doubles every 18 months) and advances in packaging are enough to allow a significant performance increase every 18 months as well.

Embedded RISC processors, on the other hand, have undergone several changes in CPU architecture to both keep up with performance requirements and to match the capabilities of the currently available silicon technology. Perhaps it's time to once again revisit the embedded processor architecture, and in particular the processor pipeline.

Incorporating a reasonably powerful 32-bit RISC processor into an ASIC design became feasible with 0.8-micron silicon processes, which could fit 200K gates onto a 100 mm2 die. For high volume applications, placing the processor on-chip provides a significant cost reduction. In the early days of sub-micron process technology, however, the state-of-the-art embedded RISC processor chips were simply too large and too complicated to fit on chip and still leave room for application-specific logic. Advanced RISC Machines (ARM) created the ARM7 to address this issue. The ARM7 had a proprietary instruction set and a simplified three-stage pipeline, which translated to lower power consumption, smaller die size and a perfect match for 0.8-micron to 0.6-micron process technologies. Other processor companies followed with similar three-stage pipeline RISC processors.

Shrinking geometries

In the mid-1990s, process technology advanced to 0.5-micron and 0.35-micron, and die size became less of an issue for embedded RISC cores. To increase performance, RISC processor core designers reverted to the well known, proven five-stage pipeline architecture. ARM, Lexra, and IBM provide some of the better-known examples of RISC cores with five-stage pipeline designs.

This brings us to 1999, and 0.25-micron process technology. RISC cores with five-stage pipelines typically run at 150 MHz to 250 MHz, whereas older three-stage cores run at about 100 MHz maximum. Measuring about 3 mm2 to 6 mm2, these five-stage RISC cores are often the smallest part of the chip, overshadowed by their companion memory blocks.

What will the new millennium bring

To begin the exploration of this question, one must examine the five-stage pipeline developed by the founders of RISC processor architecture. Figure 1 shows the pipeline stages and their order: First (I-M) is instruction fetch from memory, second (SF) is instruction decode and source operand read from register file, third (EX) is execute, fourth (D-M) is data fetch from memory, and fifth (WB) is result write back to register file.

The five-stage pipeline provides two cycles for instruction preparation, one cycle to execute the instruction, and two cycles to process the data. The performance of the SF, EX, and WB stages are dependent on the speed of the logic within the CPU, while the performance of the I-M and D-M stages are dependent on the performance of the memory. While most of the time in the EX stage is used to process logic, only a fraction of the time in the memory stages (I-M and D-M) is used to access memory, with the rest of the time spent formatting data which may have been compressed or may require sign extension (see Figure 2).

The five-stage pipeline design will not survive the next century for two reasons. First, memory design is only partially digital, which means that memory does not scale to the same performance improvement as the logic-dominated EX stage. Second, as application software becomes more sophisticated to take advantage of increasing processor performance, programs become larger and more memory is required. The need for larger memory blocks which do not scale in performance the same as pure transistors will create a disparaty between theoretical and reachable performance.

The following chart is a performance estimate for a traditional five-stage RISC core (see Figure 3). The blue line shows how performance of the EX stage would increase with advances in process technology, and the red line shows how performance of the I-M and D-M stages would increase as process technology advances and memory size increases.

A five-stage RISC processor core at 0.35 micron is wellmatched to 16KB instruction and 16KB data memory configuration. However, the difference in the rates of performance increase will eventually create a 50 MHz disparity between theoretical and achievable performance.

Addressing the issues

Addressing memory bandwidth requirements while effectively using available transistors requires a new approach in RISC processor core design. A five-stage, 32-bit RISC core will be smaller than 1mm2 in the 0.18-micron generation of process technologies. The memory bandwidth issue could be solved by devoting more pipeline stages to memory access. But is this really a good use of the process technology advance? Should the next generation RISC processor core be super-scalar, should it have 64-bit busses, should it have more application specific instructions, or all of the above? What else do the emerging blockbuster applications really need? As one embedded processor analyst recently noted, "it may be time ... to get out the heavy pipe-fitting tools and get ready for some serious plumbing."

Charlie Cheng is founder of Lexra Corp. (San Jose, CA).

Sidebar 6: True configurability ends processor compromises

by Chris Rowen

The era of SOC design is inspiring a revolution in microprocessor design, and leading the insurrection are configurable processors. Configurable processors offer the promise of dramatic enhancements in application performance, power and die size relative to legacy fixed-function processors, and significant gains in flexibility compared to hardwired, datapath plus state machine, designs. At the same time, the ease of integration and intrinsic silicon portability of these cores helps reduce silicon costs and time to market compared to traditional design methods.

System designers have long sought this kind of application-specific tuning. But, until recently, they have faced a tough trade-off between flexible adaptation of the processor hardware and availability of world-class compilers, system simulation tools, and real-time operating systems. This unhappy compromise has so far confined configurable processors to small, deeply embedded micro-sequencer applications, where the software complexity was low and the absence of modern tools was tolerable. Tensilica's Xtensa processor generator has coupled hardware/software technology to open up large, complex applications to configurable processors.

This means a no-compromises solution for full configurability of every aspect of the processor hardware-instruction set, memories, peripheral and interface-combined with seamless and automated configuration of the leading compilers, debuggers, and RTOSs and system verification environments. Working with embedded tools and software companies such as Wind River, Mentor Graphics, Accelerated Technology and Synopsys, Tensilica has created the Tensilica Instruction Extension (TIE) Compiler which translates a single abstract instruction set description into highly-optimized portable hardware descriptions for ASIC or FPGA implementation, plus complete software and tools-test-benches, diagnostics, cycle-accurate simulators, debug environments, and libraries and compilers extensions. The processor generator even adapts the standard RTOS and board support packages to precisely fit the processor's application extensions.

Fundamentals

This fundamentally new approach to SOC processors has been in production use for more than 18 months, and has been widely adopted for use in key high-volume and high-performance applications such as network processors, digital television, wireless handsets, and digital cameras built by systems and semiconductor companies, such as Cisco, NEC, Fujitsu, Transwitch, Galileo, Zilog, and others. The successful proliferation of this technology by this breadth of design teams lets us reflect on key trends in the transition from fixed function to configurable processors:

- As silicon density and VLSI design automation improves, a growing fraction of embedded processor designs and volume are moving from hardwired to soft-core implementations. Correctly used, leading synthesis and layout tools can match the density and power characteristics of even the most extreme hand-tuned core designs, but with much lower development costs and time requirements. Moreover, their rapid migration to leading edge processes largely neutralizes the clock frequency advantage of custom design.

This increasingly relegates full-custom processors to high-end applications where power and cost can be sacrificed to achieve the highest possible performance on legacy applications.

- While more and more processor providers are moving to hardware description language (HDL) implementations, designing in HDL does not make a configurable processor. Processors require complete and matching software tools. Simply parameterizing the HDL does nothing to enable the key compilers, RTOSs and simulation environments. The current enthusiasm for configurability is inspiring a wave of superficially configurable processors, but it pays to look closely at the true availability of configurable software and tools support.

- Some suppliers have characterized the key benefit of configurable processors as the opportunity to cut cost by striping out unused features normally found in fixed-function, "one-size-fits-all" processors. This is certainly true, but in the face of Moore's Law, the real opportunity for true configurable processors is the addition of powerful application-specific features that not only couldn't be justified, but perhaps couldn't even be imagined, by the generic processor designer. This new capability puts the application designer in the position to select and describe differentiating features required for dramatic acceleration of the tasks at hand. Sometimes this means turning on support for very wide memory, or specialized DSP and image processing, or special coprocessors for encryption, packet parsing or floating point. Sometimes it means building a single processor core that combines support for several distinct applications (for example, DSP plus protocol processing plus control). Sometimes it means building a set of closely coupled processors each configured optimally for a narrower task, yet sharing common compilers, MP debug, bus interfaces, simulators, and RTOS support.

- Automating hardware and software configuration is a key step forward, but the real breakthrough comes in making that configuration fast and easy to use.

The Xtensa processor generator can translate a sophisticated configuration description into complete hardware and software design in minutes, allowing many iterations through application extension, compilation and profiling per day. Our customers report that the development of new instructions is significantly more efficient than the development of new hardwired functions outside the processor.

- The emergence of new FPGA and synthesis technology is spurring rapid innovation in reconfigurable processors, but it's important to delineate the various meanings of "recognfigurable". To some it means combining a standard processor core with programmable logic for implementation of non-processor peripheral and interface functions. To some it means runtime selection among a limited vocabulary of computational elements. To some it means true instruction set configurability mapped onto field programmable logic. All these serve roles, but the key to success for real processor reconfigurability is once again the software tools the ability to analyze applications, describe optimal processor configurations, and develop complete software systems to exploit the processor.

Through our partnership with Altera, Xtensa technology is starting to gain ground in field-configurable deployments as well. Configurable processors are changing the landscape of embedded systems design. It is creating a new design platform in which hardware and software teams can quickly collaborate to create low-cost, low-power, process-portable chip implementations that run complex software with exceptional efficiency. These small, optimized processors are emerging as the new fundamental building block of system design, just as the ASIC logic gate emerged fifteen years ago as the key simplifying abstraction for hardware design. And in a few years, we shall see more and more SOC systems built with a "sea of processors"-a sea of configurable processors.

Chris Rowen is CEO of Tensilica, Inc (Santa Clara, CA).



Send electronic versions of press releases to news@isdmag.com
For more information about isdmag.com e-mail webmaster@isdmag.com
Comments on our editorial are welcome.
Copyright © 2000 Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About