Design Article

Putting Multicore Processing in Context: Part 3

Todd Brian, Lyle Pittroff, Aaron Spear and Jeff Womble, Mentor Graphics

6/7/2006 12:15 PM EDT

Having attended many interesting presentations at the First Annual Multicore Expo in Santa Clara, Ca., recently, it is apparent that the trend toward multicore deployment in the embedded space is going strong. The case can be made that multicore trends in the embedded space will continue to grow.

The presentation topics ranged from software development for multicore to introductions of new multicore processors to tools and where the industry is heading. Heady stuff. However, one of the most obvious things was the concern for the current state of tools for multicore development.

It’s clear that hardware guys have it good relative to multicore software designers, something that may not have been apparent 20 years ago. With the advent of multicore, if hardware designers are tasked with doubling the raw performance of a processor, there is a very real chance that they will take existing IP, tweak it, replicate it and provide some interconnect logic and call it a day. If they need to make it four times as fast, they do the above three times.

Not meaning to trivialize hardware design, the point being made is that one of the reasons for going multicore is the ability to drive simpler, slower frequency, lower power cores. The increase in processing power comes from the number of cores, not their frequency or use of super scalar pipelines, etc. to make them faster. To take a nod from Adam Smith, author of “Wealth of Nations,” division of labor and specialization of labor is the key to multicore processing.

A recent study suggested that engineers are only capable of acquiring two to three skills at a time. If that is true, then embedded multicore designers need to make two of those skills the ability to understand concurrency (potentially massive concurrency) and the ability to debug their software in a multicore environment. Returning to earlier comments, chip designers as well as software developers, need better tools in order to design and debug real world, real-time, embedded applications.

One manufacturer presenting at the expo has successfully deployed their 300+ core processor to the embedded market. A software developer that wants to use this core has the option of using the core manufacturer’s compiler and debugger, or none at all if they want to use that core.

Unfortunately, this seems to be the current state of the industry -- everyone has their own way of doing things. There is no standard approach to capturing concurrency in a design. There is no standard for debugging a multicore target, or even a standard for connecting to a multicore target.

Mentor Graphics is working with hardware, software, and firmware vendors, including participants of the Multicore Association, to establish industry-wide standards to provide an easy way to mix and match operating systems, have them share resources, and communicate between each other. The company is working to establish debugging and connection standards as well as inter-core communication mechanisms.

Creating multicore-aware debuggers
Debugging embedded targets has come a long way in the last 20 years. No longer do embedded developers have to rely on “ printf debugging,” which really doesn’t belong in the embedded world anyway (first, you have to decide where a printf is going to be directed at, and it can only work after the hardware and drivers have been debugged).

Today, developers enjoy robust debugging suites that use hardware-assisted connections (i.e., JTAG) to download applications to and control the target. Good commercial packages not only let the developer start and stop the processor, but provide intuitive ways to monitor registers, memory, and stacks. One of the hardest parts of embedded development is to understand the behavior of the system, and debuggers are the tools that allow the visibility into the inner workings of an application.

However, debugging a multicore target throws a whole new wrench into the works. How does one control each core with one debugger connecting to all the cores (or with a separate debugger for each core?) How is the data for multiple cores best organized to make sense to a developer (and does that change between a two-core system and a 300-core system?) How many cores can an engineer observe at one time and still understand what’s going on in each?

These are the questions that the industry needs to answer before the power of multicore designs can really start to meet its potential.

Is JTAG the answer to multicore debug?
One of the main differences between desktop and embedded debugging is that embedded targets are external to the desktop system and have to be connected to the debug console or integrated development environment (IDE) in some fashion. Strangely enough, this device is called a “connection device” or “connection” for short.

Connections range anywhere from two-wire connections to complex and definitely more expensive Joint Test Action Group (JTAG) devices, which contain huge amounts of random access memory (RAM), or even hard drives which they utilize to queue up data. The host uses the connection to communicate with the target device. For simple two-wire connections, the interaction between the host-based IDE and the target are limited.

JTAG-based connection devices allow “on-chip debugging.” They allow the IDE to interact with the target and provide services such as remotely start, stop or suspend program execution (set a breakpoint) and allow one to view memory and register contents as well as IO and peripheral devices. The IDE utilizes a sequence of these functions so that one can establish breakpoints or step through code.

So what makes the JTAG so special? Back in the old days, printed circuits boards were tested on what is called a “bed of nails.” Basically, when the board was created, it also had test points (solder pads) placed on strategic places on the bottom of the board. After a board was populated with chips, one of the final manufacturing steps was to put the board on the bed of nails to be tested. The bed of nails has spikes that stick up to make contact with the test points.

However, as technology evolved, and more and more of the board functionality was moved into microprocessors and ASICS, the accessibility of test points became a problem. In the mid to late 1980s, several companies banded together to form the JTAG. The results of the JTAG were accepted by the IEEE in 1990 and the IEEE 1149.1 standard known as the Standard Test Access Port and Boundary Scan Architecture was born. The name JTAG (Pronounced “J” “Tag”) was kept since it is easier to say than “STAPBSA.” The boundary scan method enables in-circuit testing and eliminates the need for the bed of nails testing.

Making the right JTAG connections
The use of the industry standard JTAG scanning interface, initially developed for boundary scan testing of complex devices and boards over a low pin-count interface, has also become a standard method for accessing and debugging processor cores. This is because it requires a small number of pins, and has already been widely adopted for its original purpose.

Using JTAG for processor debugging required adding a debug service unit, or “debug logic” into the CPU core design and adding an additional JTAG scan path to access that logic. A brief overview of a common JTAG TAP (Test Access Port) with its multiple JTAG scan paths is shown in Figure 1 below.

Separate scan register paths are provided for boundary scan, reading the device ID code, initiating built-in, self-test functions and obtaining their results, and accessing the debug support unit. The TAP Instruction Register (TAPIR) is used to select the desired path, or during normal operation, the TAP is left in the Bypass state so the other functions are disabled.

Figure 1. A single JTAG Test Access Port

Multi-core (multi-TAP) Configuration
The most cost-effective configuration (lowest pin count) for multicore devices is to string the JTAG TAPs within each core along a single daisy chain as shown in Figure 2, below.

In this way, the instruction registers for each TAP are concatenated into one long instruction register. So a specific core at a known position in the scan chain can be set to select the debug support unit registers and all other cores can be set in bypass mode, thereby allowing one core to be individually addressed by one debugger control packet.

Figure 2. Multiple cores on a single JTAG scan chain

Extending this concept, multiple debuggers can each be assigned to individual cores and can send debug service control packets to their assigned core without impacting (or creating awareness of) the other cores (ignoring shared memory considerations for the moment), since Ethernet debug service packets are queued and executed in the order of arrival.

Synchronous Stopping and JTAG Skid
Individual commands issued to a CPU core over JTAG require hundreds of JTAG operations. While these appear to execute very quickly (the JTAG scan chain may typically be doing serial scans at 10 MHz to 40 MHZ), at least to the human viewer, this is actually a very slow process in comparison to a CPU core running at say, 400 MHz to 1.2 GHz.

Since JTAG debug operations and processors running at hundreds of MHz are inherently asynchronous functions, without hardware support on the chip, it is not possible to stop one processor at a breakpoint, and have that event cause another core to stop precisely at that location using only JTAG operations. The time lapse between issuing a JTAG command and the processor responding thousands of CPU cycles later is commonly known as “skid.”

What this looks like from a debug experience standpoint is that you are debugging the cores completely independently; there is no real interaction between them. So connecting to multiple cores simultaneously really doesn't mean much, because even when you do that, you still have the situation that you cannot do anything to both cores at the same time. This is a limitation of JTAG, and also of the fact that there is no formalized hardware interconnect standard for multicore debugging.

To address this problem, built into the core of the Mentor Graphics EDGE debugger is the ability to have "synchronization groups." That is, designers can define a group of threads that are to be stopped when a given thread hits a breakpoint. This is backed up by a capability that the back end transport provides to the debug engine that says "I can stop this set of cores synchronously."

If this capability is not there, then the debug engine does its best to emulate the capability by turning around and stopping the other cores when the one hits the breakpoint. Obviously, there will be thousands of instructions of skid, but without hardware standards, this is better than nothing.

Can Nexus extend to multicore debug?
As mentioned earlier, JTAG is a communication mechanism used to control an embedded processor. It does not directly have anything to do with debugging. On the cores themselves there must be debug logic that controls the core.

The “Nexus 5001 Forum” is an industry group that has advanced a new IEEE standard (IEEE-ISTO 5001) that defines just such a debug logic block to support embedded development. It does contain some compelling features such as the ability to read/write memory on the core while the core continues to run.

While this is cool, it doesn’t directly have anything to do with multicore debugging, except for the fact that it does define a high-speed auxiliary communications mechanism that can be shared by multiple cores for transmission of real-time trace data, among other things. Unfortunately, the adoption of Nexus has been very slow, and it does not have nearly the installed base that other technologies have. Also, it does not appear to have much traction outside of the automotive industry. Perhaps it will gain momentum in the future with the growth of multicore.

What It All Means
From the silicon vendor’s perspective, it is pretty clear what the vendors would get out of having industry-wide standards for connecting to and debugging an embedded target. Silicon vendors spend large amounts of time and money trying to create an “ecosystem” that is beneficial to their product.

As a result, they spend enormous amounts of time putting RTOS, tool and connection support together so that developers can use their product when it hits the street. They may have to pay tool vendors that are reluctant to support their proprietary hardware non-recurring engineering (NRE) to do the work to support them. That time and money would be better utilized plowed back into either their shareholders’ wallets or into research and development.

The ones that appear less likely to benefit, aside from the developer, is the tool and connection vendor. Why would they be likely to benefit from having all of their competitors considered for every target? One reason is that successful tool vendors have distinctive competencies that their customers value.

Also, the tool vendor knows that their profits would increase if they had a wider audience of targets that their tools could be used on. Furthermore, they spend huge amounts of time and money “porting” their product to different silicon platforms. That time and money would be better utilized by focusing on the value that they can bring to the end customer rather than chasing a moving target.

From the developer’s perspective, what does all this talk about standards mean to them? To start, it means freedom of choice. It means that they can choose from a plethora of different priced, different featured tools and connections. It means that their tools can be used across targets. It means that one connection device will connect to ARM-based multicore products as well as to MIPS, MicroBlaze and Intel-based multicore targets.

It means eliminating the requirement to purchase new tools because the debugger being used does not work on the new target. It means that the developer can spend time gaining field expertise rather than learning how to use a new tool.

The role of Eclipse in multicore debug
The current status of debuggers and connections for multicore developers is respectively good and bad. The Eclipse Foundation and various sub-projects are making headway into the embedded space. Eclipse provides a “debug platform” which debugger vendors can implement to debug any arbitrary system. The result is a common look and feel regardless of whether a designer is debugging Java, a Perl script, or an embedded C/C++ application.

From the ground up, Eclipse was designed to be able to debug multiple applications simultaneously, and has a number of features that help facilitate this. In Eclipse, all views in a frame typically reflect the currently selected context.

So if a designer has a thread in application “Foo” selected as the current context, the variables view, expressions view, and registers views all update to reflect “Foo.” If the designer then selects a thread in application “Bar,” these windows update to reflect “Bar.” Combine this with the fact that the designer can open multiple frame instances and have the beginning of a nice multicore development environment (a good reason to request a nice dual monitor system).

Eclipse has other nice features for multi-context debugging as well. “Working Sets” of breakpoints for example (e.g., set the breakpoint in file theDriver.c in Foo, but not in Bar). The DSDP Project (Device Software Development Platform) for example is driving the creation of a flexible debug hierarchy, which will be a better fit for supporting debugging in a typical embedded multicore scenario: connection device -> core(s) -> process(es) -> thread(s) for example.

In addition, the DSDP project is creating a common infrastructure for connecting to remote targets, and then using services on them (e.g., debugging, profiling, exploring target file systems, opening a shell).

More and more tool vendors are migrating their offerings to Eclipse, creating a very interesting new ecosystem. The result for the tools users is that it will be possible for them to increasingly focus on building an efficient development process on top of the tools, instead of spending so much time and energy on the tools themselves.

The status of connection and debug hardware on the board is not as positive at the moment as is the state of debugger development. The upside to all this is that the Multicore Association is working toward addressing this exact deficiency in connection and standards for hardware.

Conclusion
The Multicore Association is in its infancy. It is recommend that all interested parties including software vendors, hardware vendors and developers, invest some time, energy and money in it as it is the singular entity out there trying to bring together all the players on the multicore scene.

Hopefully, the Multicore Association and its debug working groups will get some traction and put some stuff out there quickly to help gain a following not to be ignored.

To read Part 1 in this series, go to Adam Smith's answer to multicore design.

To read Part 2 in this series, go to Dealing with hardware and OS issues.

Todd Brian is product marketing manager for Nucleus kernels products, Lyle Pittroff is product marketing manager for EDGE Connections products, Aaron Spear is Debug Tools architect, and Jeff Womble is product marketing manager for EDGE Tools products at Mentor Graphics.

For more information about multicore and multiprocessor architectures, tools and methodologies, go to More About Multicore and Multiprocessing.





Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form