High-performance controllers with frequencies far beyond 100MHz will soon be used in automotive applications, where only frequencies of 10-40MHz have been prevailing. However, the influence of this frequency boost on overall system performance is not at all linear.
Often, modern controllers are slowed down by external program memory whose access times cannot cope with the controller's speed. Frequency is an important parameter when it comes to system performance. But controller applications in particular are frequently hitting their limits. From an EMC perspective, unnecessarily high system frequencies must be avoided.
The MCU's power dissipation is another major factor. Power dissipation increases disproportionately with higher frequencies. However, this is prohibitive for ECUs that must sometimes work in scalding hot oil, which is typical for automotive applications.
An ambient temperature of 125°C, a junction temperature of 150°C and a thermal resistance of about 18°C/W translate into a maximally allowed power dissipation of 1.4W. Usually, the maximum frequency of a controller is then tweaked to this value.
In many cases, microcomputers running at speeds of several gigahertz, such as those used in desktop PCs, consume more than 100W—they blow the power budget of an electronic control unit by two orders of magnitude.
Computing performance does not increase linearly with system frequency. First of all, this is caused by the access time of external flash devices.
|Figure 1. Computing performance does not increase linearly with system frequency.|
Today, an external flash memory has a typical access time of about 70ns. And there are the controller's address and data setup times that must be added on top, resulting in a total access time of about 80ns. In burst mode, the next seven memory words can be read within 15ns of each, which means that eight memory accesses can be managed within 185ns.
The use of 32bit memory words yields a theoretical maximum of 43Mwords/s or 172MBps. Practical life shows that only about five out of eight possible accesses within a burst are really required, as thereafter a jump to another address occurs.
The time for the first access remains constant, thus resulting in a real throughput of 140ns for five accesses that translates into 36MWords/s or 144MBps.Meanwhile, DDR memories are used to overcome this bottleneck. In this scenario, the time for the first access remains constant. However, subsequent burst accesses are cut in half.
This results in a maximum of 132.5ns, which translates into 60Mwords/s or 240MBps. For the "normal" case of 5 instructions per burst, a throughput of 110ns can be achieved, yielding 45.5Mwords/s or 182MBps. For typical applications using DDR memories, the throughput thus increases from 36Mwords/s to 45.5Mwords/s.
This is an increase of 21 percent— "doubling" the throughput as the name implies is absolutely unrealistic.These calculations show that the average data throughput of an external flash device is limited to roughly 40MHz.
An advanced RISC processor can execute one instruction per cycle, but the instruction must first be read into the processor. Processors running at speeds higher than 40MHz thus cannot read more than 40MIPS and be slowed down, limiting themselves to a total performance of about 40MHz. Thus, a frequency increase beyond 40MHz will not result in a performance increase, unless an internal cache achieves a high hit rate.Integrated flash modules are quite a different story.
Their access times are significantly shorter, buses are wider (64bit), frequencies are higher and advanced flash modules provide look-ahead techniques. This means that while data is transferred from flash to the CPU, the flash already addresses the next data block. Thus, an internal flash, such as the one integrated in the MPC5554, provides a data throughput of 1GBps.
In typical applications, throughput figures of about > 900MBps can be reached. This exceeds the processor requirements of 530MBps for code accesses (without cache); data requirements come on top, of course. At frequencies above 100MHz, however, the internal flash also starts to show slight differences between the CPU core's theoretical performance and the actual system performance.
This is because for typical control applications, both program and data (i.e. parameter arrays and application-specific constants) are stored in flash. However, simultaneous code and data accesses to flash are not possible. In addition, "look-ahead" addressing is not feasible for all jumps.This is the reason caches are usually built into devices running above 100MHz.
Engineers are often concerned with their software's real-time capabilities because it makes a big difference in runtime, whether or not the software has already been loaded into the cache. However, the cache by itself is not the only decisive factor; there are also traditional parameters—such as the number and kind of interrupts—that significantly influence the software's runtime behavior.
The simplest remedy here is manual controlling of the cache. Runtime-critical routines and data can be "locked" into cache, so that these values do not get replaced and are thus available in cache during the entire runtime.
There is no "worst case" anymore, as the runtime for such routines is always cache-based.Vice versa, it doesn't make sense for some code and data contents to be transferred into cache. Accesses to these locations will not modify any of the cache's contents. Such memory locations are defined and managed using the integrated memory management unit.
|Figure 2. A crossbar switch enables the intelligent interoperation of various on-chip modules.|