Tensilica has updated the design of its configurable processor core to slot more easily into system-on-chip designs that have two or more of the cores on-chip.
Steve Roddy, director of product marketing, says the average number of Xtensa cores on a customer's chip these days is more than five. This is largely being driven by a move to replace custom logic for state machine-based designs to extended software processors.
As a recognition of the trend towards using software-based state engines, Tensilica has added support for conditional-execution instructions that can use complex decisions based on the processor's state to work out whether to run or not. One change implemented to make processors work more efficiently on a shared memory bus is a switch to enable write-back instead of write-through caching.
A write-back cache only updates main memory when it is 'flushed' by software or the cache entry is needed by another piece of data. In limiting writes, a write-back can reduce bus traffic.
The company has not implemented a corresponding cache-coherency scheme to ensure other processors do not read stale data from main memory. Any coherency mechanism has to be supported by user software.
"Most people who implement multiple Xtensa processors on a chip don't do multiprocessing in the sense that they need hardware-managed coherency," said Roddy.
A simple addition for multicore systems is a processor ID register to identify each processor on a chip. The main advantage of this will be seen in channelised systems, says Roddy, where processors run the same code but on different data sets.
"It makes sense to share the code memory, but with most processors, you have the situation of not easily determining whether it is processor A or B [running the software]," said Roddy.
To let processors receive data more easily, each core can work as a slave. This lets an external direct memory access engine pass data directly to the Xtensa's local memory. The company has assembled a cookbook of modules that are designed to go into custom execution units.
The modules - which are called as macros from the Tensilica instruction extension (TIE) language used to define each new instruction - implement the functions such as carry-save multipliers and multiplexers.
Roddy said: "You don't want to slow down the pipeline just because you implemented the multiplier the wrong way. We have taken the [Synopsys] DesignWare concept to provide functions that are known to work well in our pipeline.
"We added these blocks in response to customer requests."
Conditional State
To save itself the costs of branch instructions, the company now allows user-defined conditional load and store instructions.
Typically, conditional execution on processors such as the ARM uses flags or register values to determine whether the instruction should run or not. Tensilica has gone further by letting designers use a combination of flags, register values and the current internal state of the processor.
"You can create state variables for multicycle operations. This lets you implement some state machine-like functions. The combinations can be arbitrarily complex," said Roddy.
"For example, you can do one that loads a value based on whether the operation is in the middle or at the end of a loop.
You can pick up 5 to 10% extra performance or lower power because you can handle these things conditionally."