Clearly, wireless LAN (WLAN) systems are a hot topic in the design community. With 802.11b taking off, and 802.11g and 802.11a coming quickly, much attention has been placed on building WLAN access points, PC Cards, and more.
But, as the industry moves forward, some key technical issues must be resolved. In particular, as WLAN systems move to higher operating frequencies and higher data rates, forward error correction (FEC) becomes an increasingly big headache.
The answer to this problem will lie in optimizing FEC coding schemes, such as the Viterbi algorithm, so they support the demanding requirements of the 802.11a specifications. To help out, this article will describe the architecture and implementation of a Viterbi decoder that achieves a decoding rate of 80 Mbps in WLAN designs. Both the soft and hard decision decoder scheme will web addressed along with the building blocks required to implement this decoder in silicon.
In any wireless system, especially WLAN, finite signal and noise powers lead to a strong probability that any bit in any message may be corrupted. These corrupted bits lead to errors in wireless transmission, causing headaches like lower data rates or lost transmissions.
FEC techniques, such as Viterbi coding and turbo coding, have been developed to account and correct for corrupted bits. FEC techniques rely on transmitting the data in a n encoded form, such that the redundancy introduced by the coding allows a decoding device at the receiver to detect and correct for errors in the channel. Thus, the FEC-enabled WAN system deliver a higher gain figure and will degrade more gracefully as more errors flood the channel.
In general, the Viterbi coding scheme is one of the most popular approaches for handling errors in WLAN architectures. Viterbi coding is considered a binary convolutional encoder that generate n output bits for every k input bits received.
The ratioR=k/n is called the convolutional code rate. To achieve higher data rates puncturing technique is required.
The puncturing technique holds the key to achieve the 80 Mbps data rates. Under this technique, codes are punctured by deleting some symbols generated by the encoder on the transmit side. Thus if the sequence G1 G2 G1 G2 has the second G2 deleted it becomes G1 G2 G1 (p), where p denotes the punctured symbol and instead of transmitting four symbols to represent two bits of data (R=2/4 or1/2) we now transmit three symbols to represent the same two bits for r=2/3 rate.Similarly we can generate a 3/4 rate.
An obvious issue that must be addressed by designers is how to replace punctured bits at the receiver. The receiver can deliver either hard or soft symbols to the Viterbi decoder. A hard symbol is equivalent to a binary +/-1. A soft symbol, on the other hand, is multi-leveled to represent the confidence in the bit being positive or negative. For instance, if the channel is non-fading and Gaussian, the output of the matched filter quantified to a given number of bits is a suitable soft input. In both cases, 0 is used to represent a punctured bit.
In case of hard decision demodulation, data is demodulated into either 1s or 0s,or quantized into two levels only. The process described above makes a hard binary decision about each incoming bit and then uses only the Hamming distances. This simplifiers the hardware, but does not result in optimal performance.
A very simple variation of the Viterbi algorithm permits a modem to use soft decision demodulated data, that is, signals that have been quantized into multiple levels and digitized. The advantage of soft decision demodulation is that signal values not only indicate whether they represent ones or zeros or, but also indicate magnitude of the corruption of signals by noise at instant of quantization, i.e. it contains an intrinsic probability factors of their own correctness.
These factors can be utilized by the Viterbi algorithm in its computation of the metrics of the sequences using Euclidean or Manhattan distance. The effect of this approach is that the path errors are compared more faithfully, resulting in more informed decisions. The optimum method of quantizating the signal is to represent it symmetrically about the null value (the value exactly half way between a full one and a full zero), so there are equal numbers of levels representing 1s and 0s.
The Viterbi algorithm is a recursive sequential minimization algorithm that can be used to find the least expensive way to route symbols from one edge of a state diagram to another. To do this, the algorithm uses a cost analysis mechanism to calculate the distance between the received symbol s and the symbol associated to that edge.
The distance between the received symbol and the symbol associated to that edge in the state diagram is often referred to as the branch metric. Let's show how the branch metric works.
If BM [i, j](s), is the metric of the branch from state i to state j, the problem is finding the path for which the metric, i.e. the sum of the branch metrics of the path edges, is at a minimum. The Viterbi algorithm solves this problem by applying the following recursive equation for each state transition:
PM [j](t) = min (PM [i](t-1) + BM [i, j](s))
Where PM [j](t) is the path metric associated to the (minimum cost path leading to) state j at time t. At the end of the decoding, it is possible to reconstruct the maximum likelihood sequence through a traceback starting from the possible decoder states.2
The Viterbi algorithm has three main blocks. These are the branch metric generation (BMG) block, the add, compare and select (ACS) block, and the trace back unit (Figure 1).
Click here for larger image
Figure 1: Architecture of a typical Viterbi decoder.
The BMG generates the branch metrics for each symbol of the input sequence by calculating the hamming distance between the input symbol and the expected symbol for each connection of the trellis (state). For example, for a 1/2 rate code, there are four possible symbol combinations in the encoded sequence: 00, 01, 10, and 11.
In the Viterbi decoding scheme described here, a 5-b soft decision could be used so that the transmitted bits in a symbol could be represented as 32 possible signal levels. The inverted MSB of the 5-b words would then give the logical value of 0 and 1. It's important to note that it is not practical to use more than 32 levels of quantization for each bit of the symbol. Going beyond the 32 levels would provide a fine quantization but would only achieve an loss in the SNR of less than 0.25 dB. Thus, the cost of using more bits is not justified.
While the 5-b approach, can be used, it was not chosen for the proposed 1/2 rate Viterbi decoder described here. The 5-b scheme is complex and requires more silicon area. Therefore in the example described in this article, a 3-b approach was employed.
A branch metric unit (BMU) was also implemented as a look up table and Euclidean distances are used for comparing the received symbols to the four ideal encoded symbols.3 The branch metrics are obtained by assigning positive numbers to the 8 possible values of the input signal. The two symbols of the received code word can be projected on the coordinate system. The Euclidean distance between ideal code words and an arbitrary point in this space (the received code word) gives the branch metrics.
By increasing the number of soft bits the number of possible values for branch metrics also increases. Figure 2 shows the branch metric diagram for an 8-level decoder. A number of different measures such as Euclidean distance or Manhattan distance can be used to generate the branch metrics.
Click here for larger image
Figure 2: Branch matrix generator for a Viterbi decoder featuring an 8-level constraint length.
The ACS Unit
The ACS unit is the heart of the Viterbi decoder. Each node in the trellis diagram corresponds to an ACS processor in the corresponding Viterbi decoder. The constraint length of the encoder (K) gives the number of ACS processors for a parallel architecture (Figure 3).
Click here for larger image
Figure 3: Diagram of a typical ACS unit.
If the data frequency is very high, then parallel architecture is advisable but if the data rate is less or comparable to the clock frequency, a serial or mixed architecture (where the number of ACS processor is less than 2(sup)k-1, and a stage is computed over a number of clock cycles) is not practical. It should be noted that two ACS processors that share the same input can be grouped into a butterfly processor.
Figure 3 shows the structure of an ACS processor. In this example, the ACS processor has 4 inputs (two- branch metrics and two-path Metrics) and two outputs (the new path metric and the decision bit). The decision bit is the most important information generated by the ACS .It indicates which sum between an input path metric and a branch metric generated the smallest result and was selected as the output path metric or local winner.
If the connected bit of a state is 0 the associated ascending branch is the local winner leading to previous state. If it is 1, the descending branch will be chosen. The 64 bits are grouped and for every stage, a 64-b memory location is written. All decision bits generated at one stage are used later in trace back unit to decode the output bits.
As in convolution decoder the data is coming continuously so the path metric, which is a sum of the branch metrics of previous stages, increases to a point where an overflow occurs. In order to prevent arithmetic flow and in order to keep the register effort and the combinatorial delay for add and compare operations in the ACS unit as small as possible, metric normalization schemes are used.
Several methods for state metric normalization are known, which are based on two facts:
1.The differences δ between all state metrics at any trellis step k is bounded in magnitude by a fixed quantity δmax independent of the number of ACS operation already performed in the trellis. 2. A common value may be subtracted from all state metrics for any trellis step k, since the subtraction of a common value does not have any impact on the results of the following metric comparisons.
The technique based on second fact is known as rescaling approach. Under this approach, each iteration the minimum metric is subtracted from all the metrics. The use of 2s complement modulo normalization technique, which is based on first fact, is alternative to the rescaling approach.
The second scheme avoids any kind of rescaling subtractions. The obvious advantage in the implementation is hardware savings and a speedup inside the metric update loop, which is critical to the decoder's computational throughput.
In order to reduce the power and area utilization it is desirable to represent the path metrics in as few bits as possible, in our case it is 9 bits. The number of bits should be such that it follows:
c=number of bits decided to represent path metrics.
δmax is related to branch metric as δmax= m* (Maximum value of branch metric- minimum value of branch metric)+2)
where m = 2* constraint length in case of soft viterbi and equal to constraint length in case of hard viterbi.
In VLSI implementation of the normalization techniques, the two most desirable properties are locality and uniformity. Metric normalization is considered local if it is accomplished within each ACS processor without information from other ACS processor. It is considered uniform if the ACS operation is not interrupted to perform metric normalization. Locality minimizes the global signal communication and uniformity not only reduces the complexity but also avoiding the additional functional units (Figure 4).
Click here for larger image
Figure 4: Example of an ACS unit employing normalization techniques.
There are various techniques for implementing local and uniform normalization techniques. In the present design, the modified comparison rule is used.
Under modulo technique, the metric mj is replaced by the normalized metric mnorm= (mj+c/2)(modC)-C/2, where C is the maximum value represent by the number of bits used for defining path metric.
The modified comparison rule gives the survivor bits to the trace back unit. This rule is computed using the following equation:
Survivor (mnorm1, mnorm2)= mnorm1 x or mnorm2 x or y (m1, m2)
Where y (m1, m2) denotes the unsigned comparison. In other words survivor (mn1, mn2) equals y (m1, m2) [or the logical inverse of y (m1, m2)] if mn1 and mn2 have the same (or opposite) sign.
Trace back unit
The ACS block assigns the measurement functions to each state, but the actual Viterbi decisions on encoder states are based on the trace back operation to find the path of the states. Using the trace back operation, every state from a current time is followed backwards through its maximum likelihood path. All of the paths converge at a point somewhere previous point in time.
The point at which the corrected bit streams starts is called the merger point (also called the trace back depth). By finding this point, the trace back operation decisively determines the state of the encoder at a given time, by showing that at this point, there is no choice for an encoder state given the global maximum likelihood path.
The performance of Viterbi decoder largely depends upon the trace back depth. The increase in trace back depth increases the complexity and hardware exponentially so one has to trade off between the performance level and the complexity and hardware.
Normally for decoders using non-punctured codes, the trace back depth equals 5-times constraint length, which is sufficient to decode the correct output in the presence of noise. But for decoders using punctured codes and a 2/3 data rate, and 8-times constraint length is optimal. If the data rate if 3/4, then 10-times constraint length should be used. Going beyond a 10-level constraint length does not make sense because performance gains are minimal. Thus, it is best to cap the constraint length at 10 levels. (Note: in the 1/2 rate design described here, a 7-times trace back depth was chosen).
The main bottleneck in implementing Viterbi algorithm for convolution codes is that for every new piece of symbol information, the algorithm must trace back over multiple states. The best way to implement trace back algorithm is described by Thung.2 Under this method, at the beginning of the array, the butterfly matrix calculates the next states as a function of the current path metrics of all the states and the most recent symbol. The paths top each state in the trellis imply the decision bits that were used to reach those states, and that information is also explicitly computed. The maximum likely state is selected from the maximum path metrics of all the current states, and is passed along with the decision vector. The minimum state represents the starting point for the traceback through to encoder trellis (Figure 5).
Click here for larger image
Figure 5: Traceback unit using a systolic array structure.
The systolic array keeps the high throughput for the traceback operations by launching a new traceback into the pipeline on every cycle. The pipeline has two components, propagating through the stages.
- The Traceback states.
- he Decision vectors.
While the decision vectors move one step ahead on every cycle, the maximum likelihood (ML) state stages actually jump ahead two steps at a time. The beginning of the pipeline is analogous to time k in figure 5, in that all of the possible stages have a path to a previous state as defined in the decision bit vector. The state register at the beginning of the pipeline represents the state the traceback will occur from, On the next clock cycle, the tth state decision vector moves from one stage, but the ML traceback state jumps forward to the decision vector for time (t-1).
Again the previous state is calculated and as the state travels in to the pipeline, it moves closer to the final traceback time. The pipeline is sized such that the traceback state will reach the final state where a symbol decision is made (analogous to the traceback of the ML path in figure 5 from time k to time 0) at the end of the pipeline, and at that point the decision vector bit can be released as the true symbol.
In hardware implementation, the trace back unit has 2-level traceback length registers having a width of 64 to store and shift survivor bits, 64 registers of width 6 to store the current state, and 64 multiplexes to select the survivor bit used in the formation of next state.
Comparison of hard vs. soft decision
In general, as the quantization level of bit metric is kept longer and the decoding depth deeper, we get better BER performance while the hardware complexity increases exponentially. Also decoding the data requires an additional delay proportional to the decoding depth.
The BER performance of the soft versus the hard decision in Rayleigh fading channels is shown below in figure 3. The soft method achieves the better performance than the hard decision method by about 2 dB.
In case of quantization level vs. BER performance, we determined that by using quantization, we get a significant processing gain, typically 2 dB for 3-b quantized data and about 2.2 dB for 4-b quantized data over hard decision.
- John B. Anderson, Seshadi Mohan, "Source and Channel Coding--An Algorithmic Approach," Chapter 4 1991, Kuwer Academic Publisher.
- T.Trung, ET. Al, "A VLSI design for a Traceback Viterbi Decoder, "IEEE Transaction on Communications, vol.40, No.3, March 1992, p.616-624.
- Boo, M., Arguello, F., Bruguera, J.D., Doallo, R., Zapatta, E.L, High performance VLSI Architecture for the Viterbi Algorithm, IEEE Transactions on Communications, Vol. 45, No.2, (February --1997).
- Heller, J.A., Jacobs, I.M., Viterbi Decoding for Satellite and Space Communications, IEEE Transaction on Communication Technology, Vol. COM 19 no.5 (October 1971), 835-848.
About the Authors
Sharad Singhal is a design engineer for DCM Technologies, Gurgaon, India. He has B-Tech (Electronics and Telecom) from Regional Engineering College Kurukshetra, India and can be reached at firstname.lastname@example.org.
Manish Gilani is an ASIC design engineer in DCM Technologies, Gurgaon, India. He has B-Tech (Electronics and Telecom) from Pune University, India and can be reached at email@example.com.