To be honest it can take a little effort to wrap one’s brain around all of the stuff being announced by the folks at Tensilica, so it you will indulge me I’d like to take a moment to briefly build things up step-by-step just to make sure I understand it all myself.
What we’re talking about here is the ability to design, create, and deploy small, low power, and high-speed dataplane processor cores that exactly match the required application for incorporation in a System-on-Chip (SoC).
The resulting dataplane processor units (DPUs) combine the best aspects of performance-intensive DSP (audio, video, imaging, and baseband signal processing) and embedded RISC processing functions (security, networking, and deeply embedded control).
The starting point for everything is the Xtensa Customizable Processor concept. The first aspect of this is
Configurability – designers are offered a menu of checkbox and drop-down menu options so they can pick just the features they need – including multiple pre-verified DSP engines. The second aspect is Extensibility – designers can add their own instructions, registers, register files, and much more using the Tensilica Instruction Extension (TIE) methodology. The designer only has to specify the functional behavior of the new data path elements in the TIE language (Verilog-like) and then the RTL and whole tool chain is automatically generated.
All of the tools, including the compiler, debugger, and ISS, are automatically updated to match the configuration options and any custom extensions. The matching tool set is generated by the Xtensa Processor Generator at the same time the new processor RTL is created.
Now when I first heard about this way back in the mists of time I thought it was a brilliant idea and that designers would leap into the fray with gusto and abandon. And quite a few of them did, but (and this is only my recollection/understanding of things) many designers were a little wary at first.
So the next thing that happened was when the folks at Tensilica started using the Xtensa system to generate their own families of processor cores, such as Diamond Standard Processors
, HiFi Audio DSP Cores
, and Video DSP Solutions
As you may know, all of these cores proved to be wildly successful and they've been enthusiastically received by designers. More recently, the guys and gals at Tensilica entered the communications DSP arena, starting with the ConnX BBE16 DSP Core
, which was introduced in February 2010. The ConnX BBE16 is built around a core vector pipeline made of sixteen 18bx18b MACs (multiply accumulators). ConnX BBE16 is optimized for performance of DSP kernel operations such as FFT (fast Fourier transform) and FIR (Finite Impulse Response) as well as matrix multiplies.
All of which brings us to a whole slew of announcements with which we’ve been bombarded over the last few days as follows...
Reference architecture for complete baseband PHY for LTE, HSPA+, and WiMAX
The folks at Tensilica say that all the optimized programmable DPUs (dataplane processing units) of their Atlas Reference Architecture are now available for customer evaluation. The Atlas Reference Architecture uses the Tensilica ConnX BBE16 baseband DSP core (as discussed above) coupled with three function-specific DPUs to allow baseband PHY (physical layer) SoC developers to create a very low power and minimal size PHY system, while enjoying the flexibility of a fully programmable radio, which is vital for competitive multi-standard user equipment devices (handsets) and femtocells. Atlas supports the 3GPP (3rd Generation Partnership Project) LTE (Long-Term Evolution) standard, as well as other complementary standards such as HSPA+ (Evolved High-Speed Packet Access) and WiMAX.
In addition to the ConnX BBE16, there are several other functions that must be implemented for a fully functional PHY system, and these are better implemented in function-specific DPUs to offer lower power and smaller size and address the control functions required. The three other Atlas components are:
- The ConnX Soft Stream Processor (ConnX SSP16), a 16-way SIMD (single instruction, multiple data) baseband core optimized for the processing of soft bits, used for the acceleration of wireless communication PHY routines such as Viterbi, HARQ, and de-rate matching, as well as data manipulation and movement operations.
- The ConnX Bit Stream Processor (ConnX BSP3), a baseband core optimized for the processing and control of bit streams, used for the acceleration of wireless communication PHY routines such as bit mapping, bit interleaving, and turbo encoding.
- The multi-standard ConnX Turbo Decoder (ConnX Turbo16), a programmable turbo decoder for LTE and HSPA+ that achieves 150 Mbps decoded bit rate for LTE. The size of this multi-standard turbo decoder is in line with most RTL (register transfer level) hardware implementations in terms of power and area.
These Atlas dataplane processors offer superior performance per area and power for their specific operations, and are akin to hardware acceleration blocks that provide the post-silicon flexibility of a fully programmable processor. Coupled with the ConnX BBE16 DSP core, they offer leading class performance per area and power for a full LTE PHY implementation.
The ConnX SSP16 and ConnX BSP3 DPUs are available for evaluation now. The multistandard ConnX Turbo16 DPU will be available for evaluation June 2011.
LTE Advanced is racing towards us...
Do you remember the good old days when 2G seemed to be state-of-the-art? Now it’s “So early 21st Century my dear!”
If you don’t have 3G you aren’t worth talking to, and 4G is rapidly starting to appear on the scene.
Truth to tell I’m starting to find it difficult to stay on top of all of the acronyms and abbreviations myself; goodness only knows how my dear old mom keeps up with things. Suffice it to say that a lot of people regard 4G as being synonymous with LTE (Long-Term Evolution), but we’re already starting to talk about LTE Advanced, which requires at least five times more processing power than LTE.
The timeline for the introduction of LTE advances is as shown below. As you can see, field trials and early implementation is expected circa 2015-2015, which means we have to start work now.
The first part of the puzzle is to have appropriate IP cores available, which is where folks like Tensilica come into the picture. This is followed by the design of the chips, which should become available by 2013. This is just around the corner – there’s not much time – tomorrow comes sooner than you think.
ConnX BBE64-128: The world’s highest performance DSP IP core for LTE Advanced
But turn that frown unside down into a smile, because the guys and gals at Tensilica have extended their BaseBand Engine (BBE) family with the ConnX BBE64-128 – the next-generation architecture for DSP IP cores for SoC designs.
The ConnX BBE64-128 provides over 100 GigaMACs performance in 28nm high-performance process technology, easily outperforming all other DSP IP cores on the market. The ConnX BBE64-128 was designed to meet the performance requirements for LTE Advanced.
Additionally, the chaps and chappesses at Tensilica have introduced the ConnX BBE64-UE, which is specifically optimized for the low power and small area requirements of LTE Advanced handsets. These two new products are based on the new ConnX BBE64 architecture, which Tensilica’s customers can use to optimize a DSP core for their particular requirements.
ConnX BBE64-128: Breaking the 100 GigaMACs barrier
The new ConnX BBE64-128 DSP can perform at 128 GigaMACs per cycle for maximum throughput and minimum energy for most common MIMO (multiple in, multiple out) and channel estimation functions, used extensively in LTE Advanced software. It is based on a multislot VLIW (very long instruction word) architecture that provides high sustained performance across many applications with dense code and power efficiency. For non-vector algorithms, high code density can be achieved with modeless switching to Tensilica’s smaller standard 16- and 24-bit instructions. Almost any operation can be performed from any slot in the VLIW format for greater sustained performance, lower energy and denser code.
This flexibility allowed Tensilica to design the BBE64-128 so it can run 128 MACs (multiply accumulates), which is particularly helpful for FIR (finite impulse response) filters and matrix operations that dominate LTE Advanced channel estimation and MIMO processing. “We leveraged our Tensilica DPU (dataplane processing unit) technology to create a more compact ConnX BBE64-128 DSP by providing the extra MACs just for those functions required by LTE Advanced when needed,”
stated Chris Rowen, Tensilica’s CTO. “We believe this gives our customers the best performance, price and area efficiency.”
Other features of the ConnX BBE64-128 that accelerate performance include:
ConnX BBE64-UE: Low power for handsets
- High-performance “soft bit” vector data types and operations including arbitrary field insertion and extraction for complex transmit operations, resulting in over 250 general 10-bit operations per cycle.
- Parallel register files for 10/20-bit and 40-bit data types for easier compilation and higher performance at lower power.
- Large register files for performance on complex code, reduced memory bandwidth requirements, reduced power and easier compilation.
- Single-cycle 16-way complex radix-4 and radix-8 FFT (fast Fourier transform) and DFT (discrete Fourier transform) for efficiency on arbitrary size transformations common to OFDM (orthogonal frequency-division multiplexing) algorithms.
- Accelerated interleaving for all bit, byte, half-word and word vector types for flexibility and efficiency in HARQ (hybrid automatic repeat request), forward error correction and convolutional coding.
- Cellular modem acceleration with an optimized capability for max-index search, demap, despread, vector divide, vector recip and square root
- Rich operation resources – multiple parallel execution units of each type to provide greater instruction scheduling flexibility and higher performance on code that uses one execution type heavily.
- Expanded vector memory operations for easier automatic compilation of complex C code at maximum performance on any data size and placement.
- A high-performance AXI interface for easy shared memory connection to memory and other cores.
- Extensibility – the ability to optimize design for specific needs by adding custom instructions in minutes with Tensilica’s automated tools – allows great design flexibility for adding special memory interfaces, special per-SIMD (single instruction, multiple data) lane lookups or other required functions.
- The widest range of pre-defined “point-and-click” configuration options in Tensilica’s history for maximum design flexibility.
Handsets and other user equipment have extremely tight power budgets, as well as restrictions on the total area of the design. ConnX BBE64-UE was developed with this in mind and is based on a minimum feature set for minimum energy and latency. It is optimized for interface with low-power specialized engines (programmable or hard wired). While excluding such features as the option to run 128 MACs/cycle, this high-efficiency processor can reach approximately 300,000 GMAC/second/Watt in 28nm low-leakage process technology.
Customizable for a variety of requirements
Because the ConnX BBE64 family is based on Tensilica’s patented customizable processor technology, various functions can be tailored, turned on or off, and added to during the SOC design process. All hardware changes are quickly reflected in the automatically generated compiler and complete software tool chain exactly matching all configuration options and additional instruction extensions.
“The beauty of our technology is that it allows for easy customization and optimization, as we know that the requirements for high-throughput infrastructure equipment are different from that of low-power user equipment, yet both need to run complex baseband software,”
added Rowen. “Different design teams will have different design objectives, and the configurability and extensibility in our DSP cores allow them to meet their architecture needs with minimum effort and time.”
Advanced DSP compiler support
The compiler is automatically generated to match the exact configuration options chosen during the design process and features full native DSP data-type support (integer/fractional, real/complex). It automatically infers complex instructions, accelerates and vectorizes legacy code from ConnX BBE16, accelerates legacy code written with industry-standard intrinsic functions, vectorizes loops with complex conditional operations, and performs ANSI C operators on vector datatypes. It comes with improved tools and an “analysis cockpit” for program analysis including a vectorization assistant.
Tensilica representatives are available now to discuss design requirements with interested customers. A complete evaluation kit for the ConnX BBE64-128 and ConnX BBE-UE cores is expected to be available for early access customers in the fall of 2011.
"Strutting their Stuff” at MWC 2011
The folks from Tensilica say that they are going to stand out at this year’s Mobile World Congress (MWC) in Barcelona, February 14-18, 2011. Apart from anything else, they say that they’ve moved into the number one position as the supplier of baseband DSP IP cores for the new generation of 4G LTE equipment. They say that eight of the top fifteen LTE chipset manufacturers are working with their cores.
“Tensilica has jumped to the number one position in baseband DSP IP cores for LTE because they provide the widest range of solutions for mobile handsets and wireless basestations. They’ve been able to get these products to market quickly by utilizing their customizable DPU technology,”
stated Will Strauss, president of Forward Concepts and leading DSP analyst. “And the new BBE64 DSP family they’re introducing today will be a major step forward to meet the even greater needs of LTE Advanced.”
Impressive demos at Tensilica’s booth
Tensilica will be showing the following products at their booth, number 1F39 in Hall 1:
Sign-up for booth meetings now
- The laptop LTE data card designed for NTT DOCOMO’s rollout of LTE in Japan.
- The base station on-a-chip designed by DesignArt Networks for LTE.
- A comprehensive demonstration of an LTE link showing H.264 video transmit/receive running on Tensilica cores and the mimoOn LTE software.
- Multiple systems using Tensilica HiFi Audio IP cores. Tensilica audio is designed into a large number of Tier 1 smartphones.
Tensilica will have two private meeting rooms in its booth at MWC. If you are interested in meeting with Tensilica to discuss LTE design or HiFi Audio, please contact your local Tensilica representative or email firstname.lastname@example.org
(tell them “Max says Hi”