Recently our engineering team at Plextek was involved in the initial design and construction of a broadband wireless access system for Radiant Networks Plc (Essex, England). It would consist of a mesh of point-to-point microwave links in the 28GHz band. With paths of variable length and quality, each modem was designed to support four bi-directional Time-Division Multiple-Access (TDMA) channels at 100Mbit/second if possible, with fallback rates of 50Mbit/sec and 25Mbit/sec, if conditions demand, using adaptive equalisation and modulation. In addition there were mesh-specific operations such as antenna steering, exploration, node discovery and link formation and maintenance. Statistics were also gathered to support mesh management.
In the system design phase, a range of installation, operating and management scenarios were explored, resulting in high-level operational and technical requirements. The modem was conceived as a system component with convenient electrical and mechanical boundaries. It had to be fully functional but flexible, and developed within time and cost constraints.
The uniqueness of a mesh radio required additional monitoring and control features, which suggested that the development would not be without technical risk. Both raw signal processing capacity and significant flexibility and intelligence were required, in a limited space. The tasks range from demodulating a received IF signal near 50MHz to controlling the drive to motors which steer the microwave horns. Any practical solution would have to contain hardware and software.
Our engineers had considerable experience in implementing DSP in both areas. For this application a high performance fixed-point Texas Instruments' TMS320C6201 DSP processor was selected to provided to run C code at 1600MIPs, and a pair of Xilinx XCV400 FPGAs were selected to complement this with about a million "system gates". The logic design would be entered in VHDL.
This is a powerful route which, although perhaps lacking in circuit visualization, allows comprehensive synthesis and placement with timing constraints and efficient simulation.
Constructed on a single circuit board of approximate A5 size, the board also included 200 mega samples/second ADCs and DACs for interfacing directly to the radio at Intermediate Frequency (IF). It also included a pair of 100 mega-samples/sec DACs for debugging/diagnostics, for example, displaying a constellation on an oscilloscope.
Lower-level functions to be performed included:
Receive In-phase and Quadrature (I and Q) sampling
Half-symbol spaced ("T/2") equalisation and demodulation
Carrier phase and frequency offset tracking
Scrambling, and Hamming & Reed-Solomon Forward Error Codes (FECs)
Automatic Gain Control (AGC) and power control
Transmit RRC IQ and IF synthesis
Quadrature Phase-shift Keying (QPSK) and Quadrature Amplitude Modulation (QAM 16 and QAM64) operation depending on link conditions.
The detailed design required each function to be allocated to hardware, software, or a mix of the two, at an early stage, with well-defined interfaces. Both contribute necessary resources but neither was adequate on its own.
Perhaps contrary to modern trends, simplicity was the key. Although device performance is ever increasing which has come to the rescue of some projects for a given technology a simpler design will always be cheaper, faster, and consume less power. Here are a few areas where simplicity can be won or lost:
Each resource should be used for functions that make the best use of its strengths, for example, can be readily implemented using native primitive operations. Hardware is perfect for deterministic, repetitive and logical processes at continuous high speed. Software provides much more decision-making capabilities and flexibility.
Careful frequency planning is important: integer-related IF, symbol, logic and sampling frequencies can greatly reduce the complexity of key operations like IQ sampling and mixing to and from the IF frequency. Here, the IF frequency was twice the symbol frequency, and the sampling frequency was eight times the symbol frequency. This allowed direct IQ subsampling and very simple mixing to and from IF by assigning samples to I, Q, -I and - Q in sequence. Non-integer relation can be accommodated through the use of numerical oscillators, CORDIC processors, and the like. These are extremely useful, but consume silicon area, power, and time, and should only be used when careful system design confirms their necessity.
The use of a common clock for the logic and DSP processor makes communication between the two faster, more deterministic, and reliable.
For processing-intensive functions ( correlation and equalization), pipelining and/or parallelism are key hardware advantages. The silicon efficiency of fixed-architecture hardware is highest when running the logic at its maximum speed.
Where coefficients have constant or trivial values and where there is an advantage in not using a multiplier or adder, the rearrangement of some algorithms at bit level can reduce complexity. This applies equally well to data, for example, when constructing complex modulation from symbol values. Rather than building a complete Finite Impulse Response (FIR) filter, a look-up table (LUT) can be used.
The table length is given by the product of bits per symbol, samples per symbol, and the extent (in symbols) of the filter transfer function. This approach is fast, readily adaptable, by changing the table entries or using multiple banks, and handles a wide range of linear and non- symbol-based modulation schemes.
This method was used in the Radiant modem, with a 64-entry table for each modulation type, incorporating the necessary scaling to give the required transmit power in each case. The approach is also particularly efficient when used with FPGAs that employ LUTs in their internal construction.
High level tools should be used with due consideration of the target architecture portability usually comes at a price. One can describe processes in C or VHDL that read elegantly but give rise to cumbersome implementations. Tools can sanitize design entry, but they should not be expected to replace design effort.
Traditionally, a hardware front end runs continuously at the ADC sampling rate to process the incoming signal, implementing a complete signal path which feeds partially processed results, for example, symbols or bits, on to the DSP. However, this can be inflexible and certainly would not be adequate here because of the many modes of operation required, many responsive to "live" conditions. The final solution employed in the modem was therefore to use the hardware in two complementary ways.
First, functions which are intimately related to time-critical signals for example, front and back end sampling and filtering and frame timing run continuously in hardware to provide truly deterministic operation. These functions tend to transfer data to and from real-time memory, mapping time to signal memory location and thereby relieving other functions of this burden. For instance, the hardware correlator, which is used to find precise symbol positioning, informs the processor where in memory to initiate a demodulation process, not when to do it.
And second, the remaining functions interact with the software through control, status and data registers, mapped into the processor address space. The trick is to design in flexibility through the use of coefficients, parameters and flags, with an adequate repertoire that does not compromise performance.
Such functions operate as peripherals, hardware accelerators or "toolboxes" for the software, allowing their intelligent and flexible use, so that the software is free to construct and adapt the overall processing algorithm as necessary. It benefits both from the speed and parallelism of the hardware both between different hardware functions and with the software and a simple interface mechanism, so that enough processing capacity remains to handle those tasks which are performed in software.
Even if the processor takes a large fraction of a microsecond to get round to a task, as long each task is completed in adequate time all is well. Run time concerns have been migrated from symbol to slot levels, which is far more appropriate.
The end application may be unusual, but this basic approach is applicable to any high-performance DSP-based wireless product. With care, "the best of both worlds" can be achieved, as long as sufficient effort is given to task partitioning and interfacing - both between the hardware and software, and the logic designers and programmers.