Tensilica, Inc. today announced its V6 suite of automation tools, which significantly speed the design of major blocks in system-on-chip (SOC) design, making it easier and faster to design SOCs with configurable processors than with custom logic. Using the V6 development tools, a design team with an existing algorithm coded in C or C++ could develop a customized Xtensa LX processor in a day, whereas a typical RTL (register-transfer-level) design cycle usually requires six to nine months. The V6 tools build on the automation capabilities released with Tensilica's XPRES compiler and deliver complimentary tools that further accelerate and automate the SOC design process.
The new V6 suite includes a fully pipeline-accurate instruction set simulator (ISS), which provides the clock-cycle accuracy of RTL hardware simulation while running two or three orders of magnitude faster. The V6 suite also includes a new version of the Xtensa C/C++ Compiler (XCC), which optimizes programs written in C or C++ for either code density or execution performance on an Xtensa LX processor. In addition, the Xtensa Xplorer design environment has been updated to be easier and faster to use. The V6 suite also includes the XPRES Compiler, the first and only tool that analyzes standard C/C++ code to automatically create optimized instruction-set architectures (ISAs) for Xtensa configurable processors. By using the XPRES Compiler, SOC designers avoid having
to hand-code algorithms in complex languages such as Verilog or VHDL. Designers also avoid the lengthy verification cycles required when hand-coding hardware designs.
'Designing custom logic with RTL (register transfer level) languages like Verilog or VHDL just takes too much time and too many engineering resources, particularly for verification,' stated Chris Rowen, Tensilica president and CEO. 'With these tools, designers can automatically develop
optimized, task-specific Xtensa processors and associated firmware in a fraction of the time and without the time-consuming verification requirements of hand-written RTL. Xtensa processors are guaranteed to be correct by construction, so only functional verification is required.'
New Fully Pipeline-Accurate ISS
Tensilica's new V6 ISS is the first fully pipeline-accurate ISS for a configurable processor. Tensilica's Xtensa Processor Generator automatically creates a matching ISS for each customized Xtensa LX processor. Each matched ISS fully comprehends all modifications made to the corresponding Xtensa LX processor, including modeling the exact, cycle-by-cycle behavior of Tensilica's revolutionary TIE Ports and Queues. TIE Ports and Queues provide unlimited I/O bandwidth directly into the Xtensa LX processor, bypassing slow Load/Store units that are typically used in traditional processor designs. As a result, TIE Ports and Queues break through the performance barrier that has prevented processors from being used in place of custom logic. When a processor is used as an alternative to hand-coded RTL to implement a task block in an SOC, it is essential that the processor's ISS be pipeline-accurate so that the designers can accurately evaluate the performance of that task block. Hardware designers use the ISS in a similar manner as they use logic simulators for detailed simulations of hardware behavior.
Other instruction set simulators for configurable processors are only instruction-level accurate;
they model instructions at the instruction boundary in execution order. While this level of modeling is fine for the customary control functions usually assigned to processors on SOCs, it isn't sufficiently accurate for situations where the processor is used to implement high-performance tasks usually implemented with dedicated, hardwired logic. An ISS that is only instruction-level or cycle-accurate does not model events that occur inside the processor and, therefore, can produce inaccurate simulations for certain events such as bus errors, interrupts, and other exceptions that disrupt the flow of instructions in the processor's pipeline. Tensilica's pipeline-accurate V6 ISS gives the design team full insight into the processor's behavior, no matter what type of event is happening, which is essential for hardware-oriented tasks.
Even with this increased modeling accuracy, Tensilica's new V6 ISS is fast. It provides the accuracy of a Verilog or VHDL simulation but at speeds two or more orders of magnitude faster than RTL simulation. By accelerating system simulation, Tensilica's V6 ISS decreases overall design time.
Enhanced Xtensa C/C++ Compiler (XCC)
The V6 release of XCC automates C/C++ code optimization for the Xtensa LX processor. It automatically locates code blocks that can be optimized for performance (those areas used most often) and code blocks that can be optimized for compactness or code density (those areas used less frequently). It can even prune code that is never used (dead code). No vendor-specific compiler directives are required to effect these optimizations, which preserves code portability. Other optimizing compilers usually require that developers manually insert flags in the code to tell the compiler when to optimize for code density and when to optimize for performance.
XCC employs automatic, feedback-directed compilation with a multi-pass compilation process that includes three steps:
Step 1: Compile the C/C++ code.
Step 2: Run the code with profiling, generating detailed usage information on how often each code segment is invoked.
Step 3: Re-compile the code using statistics generated in Step 2 to optimize code regions for speed or density, based on actual usage.
The execution-profile driven compilation employed by XCC allows finer-grained speed and code-size trade-offs than competing methods, especially for large programs.
XCC can exploit the ISA extensions created by the XPRES Compiler to produce compiled code that runs substantially faster than code produced for a general-purpose embedded processor. (The optimized Xtensa LX processors retain their general-purpose nature so they can also run any compiled C or C++ program). XCC then compiles the C/C++ code to run on the optimized processor, taking full advantage of any ISA extensions created by the XPRES Compiler. Any other code compiled for the optimized processor will also be able to take advantage of the processor optimizations created by the XPRES Compiler.
XCC also includes an advanced SIMD (single instruction, multiple data) vectorization capability for improved performance of digital signal processing (DSP) tasks. XCC can automatically vectorize certain loops, resulting in greatly improved performance. The SIMD engine operates on multiple data values simultaneously, greatly improving throughput of data stream being processed with various DSP algorithms. The XCC vectorizing compiler thus automatically uses SIMD processor extensions, thus eliminating the need for manual re-coding or the use of manually placed intrinsics.
The XPRES Compiler (see press release dated July 7, 2004) takes a C/C++ program as input and generates optimized TIE (Tensilica Instruction Extension) instructions to customize the Xtensa LX processor's ISA. It can be used in an automatic mode or under full designer control, so the designer can guide the tool, select instructions, and even tune the original application to take better advantage of the added hardware instructions. The XPRES Compiler can generate ISA optimizations for (and thus speed performance of) frequently executed code blocks-such as inner loops-and complex blocks including highly branched code that is almost never optimized because of its complexity. The effect is a general acceleration of code performance with significant improvements in the performance of critical inner loops.
Improvements to Other Automation Tools
Besides the new Xtensa ISS, Xtensa C/C++ Compiler and XPRES Compiler, Tensilica's comprehensive suite of automation tools includes the TIE (Tensilica Instruction Extension) Compiler, which automatically compiles designer-defined ISA extensions to the Xtensa processor. The V6 TIE Compiler has been enhanced to understand variable-length Flexible Length Instruction Extensions (FLIX) instructions and Tensilica's unique ports and queues, which can bypass the load/store bottleneck that limits the performance of other embedded processor cores.
Tensilica's Xtensa Xplorer design environment was also enhanced with improved multiple-core SOC design support, including modeling generation of multiple-core system simulations using Tensilica's XTMP (Xtensa Modeling Protocol) tool, a System-C compatible simulation environment. Xtensa Xplorer serves as a cockpit for multiple-processor SOC hardware and software design. Xtensa Xplorer integrates software development, processor optimization and multiple-processor SOC architecture tools into one common design environment. It also integrates SOC simulation and analysis tools. Xtensa Xplorer is a visual environment with a host of automation tools that makes creating Xtensa processor-based SOC hardware and software much easier.
Pricing and Availability
The Tensilica Processor Developer's Toolkit is available starting at $9000 per set per year. The Toolkit includes one seat each of the Tensilica Instruction Set Simulator, Xtensa Xplorer Processor Developer's Edition, and XCC compiler. TIE Compiler and XPRES Compiler are licensed separately. These tools are available and shipping now.