MANCHESTER, England Transitive Technologies Ltd. (San Diego) has built a binary translator that will allow MIPS-based systems running Linux to execute and optimize software written for X86-based PCs.
The company's Dynamite X/M uses dynamic software translation, a process that Transmeta Corp. (Santa Clara, Calif.) calls "code morphing" when used on its X86-capable Crusoe processor. But Transitive claims its technology is generic and can be applied to many source-target processor pairs.
But in addition to developing software "products" based on these processor pairings, Transitive hopes to work as a "co-architect" with processor designers to help optimize hardware for dynamic translation and to create original architectures that will cope well with legacy code.
"Transmeta is an obvious point of reference. It pointed the way to a cost-effective way of designing hardware and software. We believe this style of building CPUs is applicable in all design spaces," said Alasdair Rawsthore, chief technology officer of Transitive Technologies. "We differ in not being committed to a particular architecture. We can take advantage of hardware assists: we are not demanding an underlying architecture.
"Currently, the demos and products are software-only," Rawsthorne continued. "We believe that, long-term, there are sufficient reasons to develop compatible CPUs. The ability to reoptimize software is not present at any other level.
"We have been thinking about slow-start approaches. CPU designers can add simple things to their architectures. For example, the exact execution path profile [of a piece of code] is expensive for us to generate," he said. "We would like to know that if a branch is taken to the left whether it branches to the right consistently later on. A lot of processors generate that kind of information in their correlated branch-prediction units. If we could get access to that we could improve our region optimizations dramatically."
Transitive plans to help companies make architectural improvements to their processors without forcing developers to explicitly recompile applications.
Issues of legacy code and the need to recompile has led to less than startling results for some mainstream processors, such as the Pentium 4, because existing applications often cannot make use of pipeline improvements.
Transitive has coined the term "synthetic CPU" for a processor core designed to run its translation software to make full use of pipeline improvements that would otherwise be incompatible with existing code. However, one obvious short-term market is to provide a speed boost that another architecture is unable to support in its current form; this is the market that Transitive is going after with Dynamite X/M and other derivatives of the Dynamite technology.
Dynamic details
Transitive splits the translation process into three parts.
The first part takes the original instruction stream and converts it into an architecture-neutral intermediate form. Once in this form, the Dynamite kernel attempts to optimize the software, eliminating redundant code and operations that produce results that are never used.
Because the kernel has run-time information at its disposal, it can perform optimizations that would not be possible with static compilation. It is in these optimizations that the company reckons it can achieve native performance levels. Compilers also use intermediate code to perform similar optimizations.
"The technology started off with a twist on intermediate code representations. It differs from compiler representations because of the speed and functional requirements it has," Rawsthorne said.
The intermediate form effectively builds a dataflow graph of the operations performed by instructions in a block.
There is a startup cost with the first translation steps but Rawsthorne said the optimizations can more than compensate for this if code is run repeatedly. However, Dynamite takes an intelligent approach to its optimization work and does not attempt to perform a full optimization immediately.
"It does things as lazily as it can," Rawsthorne said. "The first translation is quick and dirty, although it is surprisingly clean in the code it produces, but it gives us the opportunity to go back and reoptimize.
"On the first pass through, it takes thousands of instructions per generated instruction but you can get the benefit back pretty quickly," he said. "It builds a cache of machine instructions and stores the intermediate representations in memory. It then reoptimizes across regions and capitalizes on run-time information that it picks up."
Development work on the current products and demonstrations is focused on tuning how quickly the kernel attempts to perform detailed optimizations, as the benefit against run-time overhead reaches a limit asymptotically.
"We start on small regions of code, then larger-sized regions as we discover which are the hot areas," Rawsthorne said. "We concentrate on data values, so we always optimize for the current moment.
"If a data value is not needed within a block, we will migrate it out of the critical path," he said. "If that behavior is too optimistic, we have enough information to go back and recreate [the code]. We aim for fast code generation rather than optimal code generation. You can get 90-to-95 percent of what you would have with optimal code generation."
The first version of the translator sits on top of an operating system, such as Linux. But it can run closer to the metal.
"We have taken a liberal interpretation of what a piece of software can do," Rawsthorne said. "Dynamite can reside below the OS. It can optimize regions of code that include applications and libraries interacting with each other. That is a fruitful area of optimization as libraries are written for the general case. You can get levels of optimization not available with other software components."
To emulate the same behavior as the original system, Rawsthorne said the kernel is designed to generate "precise exceptions" if errors are encountered "in the way the original hardware would have generated them."
I/O translations
The kernel emulates hardware registers in the source architecture using what the company calls the Flexible User System Environment (Fuse).
"It's the software equivalent of a BIOS [basic input output system] for a CPU," Rawsthorne said. "Anything that has an interface can be brought out and we can build interfaces where no explicit interface exists in the original."
For example, a serial port in the original processor that is programmed using bit-wise accesses may be mapped onto a set of library calls which control a device driver on the target architecture.
"In the workloads we have looked at overall, the only things that need static translation are real-time interrupt response routines," said Rawsthorne.
"We have had discussions with people who have seen discontinuities in their architectures. The problem they faced was, do you stay with an obsolete instruction set to maintain credibility with customers or do you make a break?"
John Graham, Transitive's chief executive officer, added: "We believe synthetic CPUs have a sustainable advantage over hardware CPUs. It is only the lunatic fringe of real-time network processors that we would struggle against. Legacy gates are expensive in terms of power, performance and verification effort. Design times are dragging on so long that developers are trapped by their own creations."
Chris Edwards is editor of Electronics Times, EE Times' sister publication in the United Kingdom.