Next-generation wireless applications combining voice, video and data require a processor that can efficiently implement advanced third-generation (3G) algorithms. The Micro Signal Architecture (MSA), jointly developed by Analog Devices Inc. and Intel Corp., is based on a dual-MAC modified Harvard architecture core that has excellent performance on both voice and video algorithms. In addition, some of the best features of microcontrollers have been incorporated into the MSA core to allow it to replace both a DSP and a microcontroller in low-cost wireless handheld applications.
The MSA core combines the processing power of a dual-media-access-controller DSP with the control capabilities of a RISC microcontroller. A single combined instruction set eases programmability. High performance and low power consumption make this architecture ideal for processing modem, audio, video, image and voice signals in power-constrained applications. In next-generation wireless applications the DSP instructions process algorithms such as MPEG-2 and MPEG-4. The control instructions efficiently process the user interface functions, operating systems and system protocols such as TCP/IP.
The MSA core uses a modified Harvard architecture. Instructions and data reside in separate L1 memories but share a common L2 memory. All addresses are 32 bit, allowing the MSA core to address a unified 4-Gbyte address space. The accumulators are kept separate from the data registers, allowing for an efficient load/store architecture to coexist with the accumulator-based design of a traditional DSP.
The Level 1 memory is divided into several blocks: instruction memory, dual-ported data memory and scratchpad SRAM. The instruction and data memories may each be configured as either caches or as SRAM. Using cache mode relieves the programmer from managing data movement in and out of the L1 memories. This allows code to be ported quickly, with no performance optimization required for memory organization.
The memory-management unit supports both protection and selective caching of memory, providing memory protection to individual tasks and protecting the system memory-mapped registers from unintended access. The memory is divided up into regions in which the memory-management rules apply. Four different page sizes are supported: 1 kbyte, 4 kbytes, 1 Mbyte and 4 Mbytes. Each page has its own set of cacheability and protection properties. Memory protection in next-generation wireless devices allows for new programs to be downloaded without corrupting applications that may be running or information that has been stored-for example, using a wireless connection to the Internet to download an upgrade to an e-mail application.
The architecture has dynamic power-management capabilities, allowing for continuous adjustment of the processor's voltage and operating frequency, and optimizing the power consumption and performance for real-time applications. Software-selectable, low-power operating modes allow for any combination of the core and its peripherals to be put into a sleep mode, while maintaining operation of the remaining resources. The multilevel memory hierarchy provides additional power savings, with most memory references contained to the smallest on-chip memory subsystems.
See related chart
An event controller supports five basic types of events: emulation, reset, nonmaskable interrupts, exceptions and general-purpose interrupts. The interrupt system is nested and prioritized, and saves the processor state on the kernel stack following an interrupt. Each event has a register to hold the return address following a return-from-event instruction, and is assigned a priority so that all events can be organized and managed. All interrupts and exceptions are processed in the supervisor mode.
The debug interface is an IEEE-1149.1 JTAG access port. Debug features support software exceptions, hardware breakpoints, performance monitoring and execution trace. The watchpoint unit consists of six instruction and two data address watchpoints. The instruction address watchpoints may be combined in pairs to create instruction address range watchpoints. The data address watchpoints may be combined to create a data address range watchpoint. Each of the watchpoints is associated with a 16-bit event counter. The execution trace buffer consists of a 16-element FIFO that records discontinuities in program flow. It contains a compression feature that can be used to compress one or two levels of software loops.
The performance monitor unit consists of two 32-bit counters that count the number of cycles or occurrences of an event. A 64-bit free-running cycle counter can be used for code-profiling purposes.
The first implementation of the MSA core uses an eight-stage pipeline. The pipeline is fully interlocked with smart interlocking. This allows for the minimum number of stalls to be inserted to maintain program correctness. This arrangement of pipeline stages allows for the results of load instructions to be forwarded to the execution units without stalls. Data accumulation using the data ALUs and pointer updates using the address ALUs are both single-cycle operations. This allows for a very efficient cycle count on critical inner loops. Alignment-independent byte operation is provided with control signals forwarded from the address ALUs to muxes in the data register file. The deep pipeline allows the MSA core to run at 333 MHz in an 0.18-micron CMOS process.
The MSA core uses two basic types of instructions: those used for DSP-type number-crunching operations, and those used for microcontroller-type control functions and general tasks. Instructions are tuned for their specific tasks but can be intermixed with no restrictions.
Microcontroller instructions perform basic control and arithmetic operations. This includes load/store, arithmetic, logical, bit manipulation, branching and decision-making operations. Conditional register move instructions allow efficient implementation of short if-then-else statements. Load/store instructions support the following addressing modes: autoincrement, autodecrement, indirect, circular, bit-reversed indexed with immediate offset, post-modify with nonunity stride, pre-decrement store on stack pointer.
DSP instructions typically read two 32-bit operands from the data register file, compute a result and store the results to the data register file or accumulate them in the two internal accumulators. Each MAC unit is capable of computing a 16 x 16-bit signed integer, unsigned integer or fractional multiplication. Each ALU unit is capable of a 32-bit operation on two 32-bit inputs, a 16-bit operation on two 16-bit inputs, or two 16-bit operations on a pair of packed 16-bit inputs. A DSP instruction may be executed alone, or simultaneously with two load instructions or one load and one store instruction.
Special multimedia instructions enable the acceleration of fundamental operations associated with video- and imaging-based applications such as in 3G wireless algorithms. In a DSP-based videophone, for example, the DSP might implement system control functions, caller ID, a full-duplex speakerphone, wireless modem code and video encode/decode algorithms.
The balanced execution and data memory bandwidth of the MSA core allows for efficient implementation of most DSP inner-loop kernels. Software-pipelining and loop-unrolling techniques are used to adapt algorithms to the computational pipeline. The core has no latencies between load and compute instructions. That reduces the need for loop unrolling, minimizes code size and allows simple assembly language programming.
The MSA core has five modes: user, supervisor, emulation, idle and reset. The user, supervisor and emulation modes provide basic protection for system and emulation resources.
Application-level code has restricted access to system resources. The system acts on behalf of the user through system calls whenever application-level code requires access to system resources.
The core is in supervisor mode when it is handling an interrupt at some level, or a software exception. It enters emulation mode as a result of an emulator event such as a watchpoint match or an external emulation request. In the emulator mode, instructions are fetched from a JTAG scannable emulation instruction register. The instructions bypass the memory system and are directly fed to the instruction decoder.