The increasing use of automatic code generation tools that generate C code from graphical-model based designs can save many hours of software engineering effortbut can also bring with it a change in the culture of software engineering.
It becomes more efficient for a software engineer to put together existing modules, build them, and test them on a system than to write custom modules to match the system requirements. Reuse of this generic, modular software increases development efficiency and software quality. However, it can lead to poor structure, memory inefficiencies, large latency, and abstracts the design process from the hardware.
The performance of these systems can be tuned to different versions of mechanical hardware without changing the basic software. Typically, this is achieved by including many calibration variables so functionality can be enabled or disabled at run-time, gains can be adjusted, and lookup tables can be changed on-the-fly.
Software engineers rely on emulation techniques to trace the program flow in the real system, watch data as it is updated, measure latencies, and debug logical problems. However, system-on-a-chip integration of multiple cores at high clock speeds creates some challenges when debugging these systems.
The increase in size of embedded non-volatile memory and the shrinking geometry of silicon has enabled super-scalar, system-on-a-chip microcontrollers to be built with wide, high-speed internal buses feeding multiple, cached, pipelined processor cores and co-processors. This architecture has also allowed the deeper integration of the microcontroller sub-system into application environments. Integrating analysis equipment onto the external buses of these deeply embedded devices can be difficult because of the physical connection problems (there may not even be an external bus), high clock speeds, length of cables, and ambient temperatures.
In many cases, the external data fetch that can be seen on the external bus is not representative of the complete program flow because of internal caches and pipeline fetch prediction. Burst mode flashes also make decoding the fetches more complicated due to the assumed sequential address increment.
The principle issue when connecting an emulation system is that the connection length is limited because of the high system clock speed. For example, if a 150 MHz microcontroller were used, the propagation delay of a 50 cm connection would be around 2.0 ns. However, the clock period is only 6.67 ns, so a 2.0 ns single-direction delay is very significantvirtually precluding any control functions being remote from the target device because the connection acts as a transmission line at these high frequencies, and the termination cannot be guaranteed. In this example, the maximum length to ignore transmission line effects would be only 16 cm, so the ICE (in-circuit emulator) would have the same environmental requirements as, say, the ECU (engine control unit) under test.
An example of such a system-on-chip implementation is the Infineon TC1796. The 32-bit TriCore CPU has separate private buses for code and data and is bridged (via the LFI) to the system buses to allow data access to the peripheral subsystem. A further bridge (via the DMA (direct memory access)) allows access to the remote peripheral bus.
The peripheral control processor (PCP2) is another 32-bit CPU that, again, has private data and program buses that are not normally visible. This processor runs at a maximum 150 MHz for the processor subsystem and 75 MHz for the peripheral subsystems, so there are two clock domains. The device is packaged in a 416-pin ball grid array package and provides a standard JTAG debug interface for debugger support. However, full emulation for such a microcontroller requires the ability to inspect transactions over many different internal buses that have no connection to the external pins. The large embedded memories (2M-byte flash) have wide internal fetch paths (128-bit) and local caches, so execution from internal memory is significantly faster than from external (32-bit accessed) memories (See below for the TC1796 block diagram).
View a full-size image