The coexistence of numerous broad-band communication standards aimed at the last mile and local-area networks (LANs) as well as the continuous development of new standards have created a demand for processing platforms that support multiple standards. A new class of application-specific processors has emerged that meet this demand by offering the combination of flexibility and efficient performance that full-custom ASIC and general-purpose DSP solutions fail to provide.
With the proliferation of broadband standards, it is even more important to develop products that offer multistandard connectivity. Multiple standards such as asymmetric digital subscriber line, cable modem and fixed wireless compete to be the last-mile solution. Cable modem and ADSL will probably continue to coexist as metro last-mile solutions,while upcoming fixed-wireless solutions will compete for rural areas.
As broadband connections reach the last mile, the last-building broadband network emerges as the next challenge. While Ethernet is the traditional enterprise solution, a variety of popular alternatives has emerged. Home phone line and power line networks have been accepted as solutions for the wired home network. Recently, wireless-LAN technology has become cost competitive and is attracting great attention. HomeRF and 802.11b are the existing market share leaders and can deliver up to 11-Mbit/second bandwidth.
In the United States, 802.11a has emerged as a strong challenger to 802.11b by providing substantially higher bandwidth at 54 Mbits/s. In Europe, HiperLAN2 seems likely to become the broadband wireless LAN standard. Similar to last-mile solutions, these LAN physical-layer (PHY) standards will coexist in the future as each provides its own unique mixture of features in terms of power, bandwidth, range, capacity and functionality.
Traditionally, the last-mile and last-building PHY-level baseband processors have been implemented using ASIC methodology. To manage complexity, ASIC solutions generally target one standard and lack flexibility. Thus, the implementation of a new standard may require a costly and time-consuming redesign. Programmable DSPs offer an alternative solution, potentially providing flexibility, multistandard support and faster time-to-market. General-purpose DSPs, however, have traditionally lacked the resources or performance needed for demanding baseband-processing algorithms, making them suitable only for low-bandwidth PHY applications such as 56k modems. Only very high-end general-purpose DSPs can support ADSL at 2-Mbit/s data rates, and most ADSL chip sets are actually designed in ASIC gates.
A new class of processors that can support very high data rates while operating at low power is required for broadband last-mile and last-building PHY applications. This class of application-specific processors may be called universal physical-layer signal processors.
The first challenge for the new processors is how to efficiently meet the needs of such a wide variety of standards. At the block level, requirements can differ widely. For example, while an 802.11a receiver is based on fast Fourier transform (FFT), quadrature amplitude demodulation and Viterbi decoder blocks, 802.11b requires CCK correlator, Barker correlator and phase-shift keying demodulation blocks.
The dissimilarity of these standards at the block level points out an important pitfall in the design of a processor for this application area. Custom execution units targeted at block-level functions will make reuse difficult, leading to hardware redundancies and increased die area. In general, block-level execution units should be avoided unless the algorithm is very well-understood and sufficient performance would be difficult to achieve otherwise. Viterbi decoding, with its high computation requirements per bit, provides an example of where hardware acceleration might be appropriate. Still, such a block should be designed to retain flexibility in order to accommodate different data widths and possible variations of the algorithm.
A better solution is to provide a relatively large number of simple execution units that can be flexibly connected to meet the requirements for whatever block-level function needs to be implemented. When new block-level functions are required, they only need to be programmed, not designed in hardware.
The optimization of this type of processor for broadband applications requires choosing the right combination of execution units needed to efficiently implement the desired algorithms. A careful study of many different standards reveals that while the applications differ at the block level, a number of common types of algorithms underlie this application area. This list of common algorithms includes filtering, transformations, modulation, interleaving and forward error correction. An analysis of the computational and data bandwidth requirements for each variation of these algorithms found within several existing standards provides a profile of the overall optimum combination of functions for efficient operation. Two key traits that appear are requirements for a large number of multipliers to support filtering and a large number of adders to support transformations such as FFT and Walsh transforms.
Developers must consider interleaving when designing the processor's data memory architecture. Rather than depending on the total memory data bandwidth, interleaving depends on the number of independent memory accesses that are available. Certain forward-error-correction algorithms such as Viterbi and Reed-Solomon decoding may require hardware support to execute efficiently, but this hardware should be minimized and kept flexible.
A study of broadband applications also reveals the prevalence of 16-bit complex data. Lower data widths are acceptable in many places, making support for 8-bit data desirable. This mixture of data types, together with the high level of data-processing parallelism found in these algorithms, means that a single-instruction, multiple-data path that supports 8- and 16-bit data types can significantly improve performance.
But the width of the SIMD data path must be balanced with the added complexity in handling the data and the increased area required for wider data paths. A 32-bit data path allows SIMD operations on 8-, 16- or 32-bit data.
Besides better reusability, another advantage of a flexible processing platform with data type agility is lower implementation risk. System simulations provide critical information about the performance and data width requirements for various system blocks.
As algorithms become more complex an application-specific processor that is programmable and can support a variety of industry standards will become increasingly attractive over ASIC implementations for PHY layer apps. Developers who study the algorithm needs for a specific target application area can design programmable processors with dramatically improved efficiency.
See related chart