Industrial Control DesignLine Blog
Opportunities still exist for 16-bit microcontrollers
Ken Wallace
6/1/2009 7:28 AM EDT
In the push to be faster and more powerful a vital point about these architectures has been missed, the instruction sets were designed for assembly coding, the designers concentrating on the instructions that were necessary to implement efficient embedded applications.
This strength was also the foundation of an inherent weakness, the 8-bit architectures were developed before the development of high-level languages and therefore the instruction sets did not make for efficient "C" compliers with their strict demands on sizes and support for abstract structures.
Whilst some had limited arithmetic capability such as multiply and divide, they often required several clock cycles per instruction, resulting in poor MIPS figures when compared to wider architectures when they became available. Also the increasing complexity of applications demanded addressing support for larger memories which was just not available from these simple machines.
For engineers faced with such issues it was a natural tendency to select the current 'best' performance when faced with the challenges of a new product or application, and in many cases this was to use a 32-bit device. It is easier to design with 'surplus' power available knowing it could always be used to solve an issue that appeared during development, additionally, off-the-shelf high level language compilers would work without too much thought as to the physical architecture present.
However the cost of such decisions is high, the 32-bit solution is an expensive piece of silicon also the increase in MIPS comes with the attendant power increase. It will be on a leading edge technology therefore the system will have to endure higher leakage and standby currents. It also does not integrate well with RF, because of high EMI from the fast clock trees which induce large impulse currents.
The key to an optimized design is the matching of the system specification and performance to that of the selected components to carry out the tasks. An engineer would not choose to use a 20Ms/s ADC to digitize an audio application at 44ks/s, so why use a 32-bit MCU to do a task that could be accomplished by a 16-bit MCU with necessary MIPS capability. To do so would mean higher cost, higher power and standby current.
When it comes to selecting MCU's, it should be remembered that MIPS figures alone are not enough to give the complete picture it is how efficiently those MIPS are used.
For example, the Cyan eCOG16EO1 has a MIPS figure which is equal to that of some of the low end 32-bit MCU's but retains the all low power features of lower powered 8/16-bit MCU's. In battery operated applications, such as Automated Meter Reading and ZigBee, this feature would be vital in meeting the requirement of a battery power system requiring long operational life.
With the new high performance 16-bit architectures, such as CyCore, the designer does not have to give up MIPS to achieve his/her power figures. This is amply illustrated with applications using CyCore, which can run a complete self-forming self-healing RF network stack (Cy-Net3) and still have MIPS over to employ Forward Error Correction (FEC) at the physical layer. Add to this the ultra low power requirements of battery operation and a system level performance is achieved that cannot be matched by a 32-bit system. This illustrates what may be achieved when the components are matched to the problem.
Although many may say that its always possible squeeze such performance from 8/16-bit CPUs by labored, hand crafted , assembly language coding, many of today's 16-bit MCU's have quite efficient language "C "compilers. However, some 16-bit MCUs suffer from being abstracted from the physical features of the CPU, and have given the world the reputation of being very inefficient. The most efficient compilers are those which have been designed with strong awareness of the hardware itself. The design of an efficient 16-bit CPU also needs several characteristics which are necessary if efficient, compact code is to be attained from the compiler. Firstly, orthogonal, register rich, RISC architectures with load/store architectures help enormously in optimizing the compiler for high level languages. Next, adding sufficient address registers and modes enables efficient implementation of pointers and structures which are necessary in contemporary high level language software methods.
With the codependence of the CPU architecture and the compiler of such importance, the only viable approach is to concurrently develop the compiler and CPU architecture. This approach has rewarded us with a design which achieved 0.7 dhrystone MIPS and a code compaction in standard tests of 20% better than ARMv6 32-bit core designs.
So this shows that 32-bit is not always the obvious choice for high performance applications and there are other questions which should be addressed which would help to achieve the optimum balance between the requirements and implementation. For example, how wide is the biggest data size that your application handles? If this is 16-bits or less, a 32-bit solution will spend most of the CPU cycles trying to move zeros around, and you will be paying dearly for those unused register bits. Also most internal peripherals are currently optimized at 16-bits, ADC's, DAC's, timer/counters, UARTS, SPI, I2C, CAN, WDT etc, don't benefit from having 32-bit wide registers and 32-bit machine must waste register space when transferring data to and from these type of peripherals.
The performance analysis phase of any project is critical in optimizing any design for cost, performance and power consumption, but it's often skimped as a result over-specifying the requirements is sometimes seen as the safest option. This leads to products that miss the target goals on cost and leave the company vulnerable to smart competitors who capitalize on this error and provide a more cost effective solution. It is very rare for a company to redesign a product to reduce the complexity and cost down the product, as once a competitor succeeds in stealing market share its usually too late. In short, it is the hidden cost of neglecting the analysis phase.
Whilst there will always be areas where shear performance is only the only solution, the fact that the 8-bit devices still dominate the sales figures shows that the majority of the MCU applications are performed quite adequately with these components. Indeed there are several new companies forming in Asia and the Pacific rim who are introducing new 8-bit devices because they see the need for low cost, highly optimized solutions in this market space.
This extended life for 8-bit also shows that 16-bit is a logical and effective step for the next generation of embedded MCU products. Areas such as home automation, intelligent lighting control and particularly RF enable network nodes are all ideal candidates for 16-bit MCU's as 8-bit cannot meet the demands for computational performance in these applications. With the advent of the latest generation of 16-bit RISC based, high density CPU's, like Cyan's eCOG16E01, performance can be achieved in all these areas without sacrificing cost and power consumption. With devices like Cyan's producing sub 1mW/MHz figures and CPU frequencies up to 50 MHz they are effectively competing in the low end 32-bit arena.
A lesson from history exemplifies the case for design optimization. The Apollo moon landings in 1969 were the greatest technological achievement of the 20th century and this was accomplished with an Automatic Guidance Computer (AGC) that had only four 16-bit central registers. This was not because they couldn't have more registers; the design was produced for NASA by MIT. Their requirements for performance, power and functionality where carefully analyzed and this showed that four was enough to do the job, and the results speak for themselves.
Ken Wallaceis CTO of Cyan Technology Ltd. - www.cyantechnology.com
Related links and articles:




