Aura Communications is a fabless semiconductor company that has developed an enabling technology for low-cost, low-power, wireless personal-area networks. We developed a formulaic approach to choosing the right microprocessor core for our system-on-chip (SoC) application, based on estimates of MIPS (millions of instructions per second) and power consumption.
Aura's programmable SoC solution creates the backbone infrastructure for wireless voice, audio and data communications. Our LibertyLink allows full-duplex, secure communications between multiple wireless devices at speeds up to 204 kbits/second.
Unlike most technologies designed for this market Bluetooth, HomeRF and WiFi, for example LibertyLink is based on magnetic induction technology. It was in development for two years and offers a small die size, low-power and cost-effective wireless solution for the mass consumer market. LibertyLink-powered products are used for wireless headsets for mobile and cordless phones.
After determining that it made more sense to buy intellectual property (IP) than develop it internally, the design team used an internally developed set of benchmarks for processor IP. These include cost, core size, power consumption, peripheral support, and average MIPS. Criteria offer an objective way to compare IP that is weighted against the actual application. For example, Processor A may execute an ADD in one cycle, but the opcode field is 24 bits wide. Processor B may execute a similar ADD in two cycles, but require only an 8-bit-wide opcode. In this scenario, Processor B was best for the target design.
Choosing the right microprocessor
The first step was to determine the criteria we should use for the evaluation. For this, we looked at several common measurable traits of microprocessors, such as: average MIPS, core size, power consumption, peripheral support and cost. These criteria, weighted against the actual application, offered an objective way to compare IP. The weighting process was accomplished by prioritizing the criteria for the specific application. In our case, the application was extremely low power in nature, had reasonable area limitations, and required a typical set of peripheral support functions.
Due to the programmable nature of the SoC, the program code would reside in an external EEPROM, then get loaded into internal program RAM after power-up. To effectively evaluate the power dissipation of the processor, we then needed to evaluate the processor and the memory IP as one subsystem.
The next step was to estimate the average MIPS and power consumption of the subsystems being evaluated. This was accomplished by attempting to identify all the main software functions, and estimating the percentage of runtime each would require. Functions were then pseudo-coded in C/assembly at which point a "percent usage" weighting factor for each opcode group could be calculated. The MIPS number (basically "fraction of instruction per cycle") was the inverse of the summation of all the weights, times the cycles required to execute the individual opcode: MIPS = 1/(average cycles per instruction).
In our design, the RAM dominated the subsystem's power consumption. The processor core's typical value was generally provided by the vendor in units of milliWatts/MHz. The consumption attributed to RAM access can be estimated by first determining the number of RAM accesses required by each opcode. Since code is fetched from RAM, the number of accesses was typically the size of the opcode plus any accesses required to execute the opcode.
As an example, an RTS (return to subroutine) may take five cycles to execute, requiring three RAM accesses (1x opcode fetch, 2x stack pops for the return address). This particular opcode requires 0.6 RAM accesses per cycle. We sum the average RAM access times the weightings calculated earlier to arrive at an average number of RAM accesses per cycle. By combining the average instruction and power calculations, we generated a normalized metric in units of MIPS/milliWatt that was used to compare all the subsystems.
The final selection process required evaluating all the criteria weighted accordingly for the particular application. After reviewing the processor loading in our application, it became clear that an 8-bit RISC microcontroller of the 8051 class would suffice. There were, however, choices of 8-bit versus 16-bit program buses in this class of device.
The cost factor of such IP fell between $70,000- $100,000. The $30,000 cost delta was not enough to impact the vendor selection. Upon critical review of the opcode usage in our coded functions, it became clear that most of the processing we required could be accomplished with one-cycle, 1-byte opcodes executing register-to-register operations.
For forward error correction (FEC) on packet headers, we considered using a Hamming (31,26) coding scheme. This algorithm required an enormous number of register shift operations. This particular function would run efficiently, in terms of cycle and power, in the 8051 RISC core. Selecting a processor with an 8-bit program memory bus had the added advantage of reducing the core size, allowing the layout team to be more selective in digital signal placement. This is a nice advantage when working in a mixed-signal environment.
Since the MIPS and power calculations illustrated above relied heavily on the code produced by the build tools, it was essential to closely evaluate the integrity of the C compiler "candidates" and all development tools provided by the IP vendor. A compiler that produces inefficient or poorly optimized code may force a higher percentage of code written in assembler, increasing the software development task.
Implementation and verification
To implement the core, a TCL build script that was targeted for a different synthesis tool had to be modified. This task involved converting the command syntax to similar commands that the tool our team uses Ambit BuildGates from Cadence Design Systems (San Jose, Calif.).
Next, the required peripherals were selected. Each had two Verilog files associated with it: a full behavioral model and an "empty" file, which only contained the I/O pin list. By copying the desired file to the build directory, the team could include only the required peripherals. Timing constraints were applied and the core was synthesized.
The last step in the process was to simulate the design using the native compiled launch verification tool. The design team used the gate-level models pre-layout, then went through the process again with the SDF files created post-layout.
The synthesis portion took less than a day. The length of simulation is mostly a personal choice, but is heavily influenced by the degree of core customization. The test version of the core was not modified except for the inclusion of the newly designed peripherals. Within a week, the test code was completed and loaded into a Verilog ROM, giving the team the ability to exercise the ALU, serial port, and timer/counters.
There were basically two design cycles. The first was to get V8 Verilog files to play with the Ambit synthesis tool without any processor customization. The goal was to tape out a test chip with selected peripherals in order to evaluate the core. Simulations were done to the extent of booting the processor, verifying the internal registers/ALU and exercising the peripherals. Total time was approximately 1.0 to1.5 man-weeks.
After evaluation, the V8 core was chosen for the target SoC, a non-radio frequency wireless voice/data device for wireless personal area network applications under 3 meters.
The next design cycle involved modifications to the core Verilog code adding a parallel interface for an internal boot ROM, an I2C interface for the external program/data E32 PROM, programmable state register bit logic changes, and modifications to the "power save" logic. The synthesis time was negligible, but the team spent a large amount of time in simulation. In parallel to the V8 simulations, the team designed custom peripheral blocks and complex analog circuitry.
With five designers, there was approximately one man-month of design and simulation time spent on the V8 over a six-month period, resulting in considerable cost and time savings over rolling our own processor for this application.