Echo cancellation is a recurring problem in telecommunication and wireless networks. And for years, equipment designers have turned to digital signal processing (DSP) and standalone board solutions as a means for curbing echo in carrier class equipment designs.
Now the challenge falls on broadband equipment developers implementing voice-over-packet (VoP) technology. In order to succeed, these data operators must deliver voice services with the same quality as existing landline networks. In order to meet this requirement, dealing with echo is a must.
In this two-part series, we'll explore the key echo cancellation elements/techniques that broadband designers need to worry about when developing VoP systems. Part 1 will focus on the basics of echo cancellation and acoustic echo canceller while part 2 will zero in on line echo cancellation techniques. To start our discussion, let's provide an overview of echo.
Sources of Echo
In a broadband system architecture, two forms of echo cause problems for designers. The first, which we'll discuss below, is acoustic echo. This form of echo is created by the acoustic properties of the areas where a VoP phone is used.
Line echo, which we'll discuss in part 2, is the second form. This form of echo is electrical and is produced by a network hybrid (a 2-wire to 4-wire converter).
Echo cancellation is critical to achieving high quality voice transmissions over packet networks, which typically face transmission delays above 30 to 40 ms. These long delays make echo readily apparent to listeners, and must be eliminated in order to provide viable telephony service. In addition to packet-induced echo, VoP systems must also take into account the presence of acoustic echo, such as that seen with handsfree phone operation.
The VoP environment presents unique challenges to the developer working to implement echo cancellation techniques. Among these challenges are the many different VoP algorithms which consume large amounts of computational and storage resources. The echo cancellers must share space with the following algorithms:
- Voice codecs: voice compression to facilitate packet network bandwidth savings.
- Tone detection: DTMF and other signaling tones.
- Tone generation: call progress tones, DTMF.
- Voice activity detection: minimizes bandwidth during silence.
- Voice packetization: implements the transport layer for voice data.
- Voice playout: synchronizes the voice packet arrivals with voice sampling rate.
- Miscellaneous: caller ID, telephony signaling support.
To solve echo problems, designers typically implement an echo canceller in a system architecture(Figure 1). An echo canceller (EC) is often implemented using DSP algorithms. The central part of any modern echo canceling system is a digital adaptive filter.
Figure 1: Block diagram of an echo canceller.
Figure 1 illustrates a simple general-purpose echo canceling system that is terminated with a network hybrid. In case of acoustic environment, the echo would come from sound that comes out of a speaker and is reflected back to the microphone. A speaker to microphone acoustic coupling would replace the hybrid in
The adaptive filter F has a task of estimating/predicting the echo in order to eliminate it once it occurs within the send-in signal Y. In addition to adaptive filter and echo removal, an EC contains additional parts that help the EC system function properly. The nonlinear processor (NLP) has a function of further reduction of any residual echo left after cancellation done by adaptive filter. Very often a special purpose comfort noise generator (CNG) is an integral part of the NLP implementation. Finally, the EC logic controls the behavior or EC system and may provide useful instrumentation to the end-user.
There are many conditions that can influence the performance of the echo canceling systems. These include environmental conditions such as background noise, network conditions such as hybrid of phone enclosure properties, and signal conditions such as speech signal levels.
The line and acoustic EC's have to deal with all of the above conditions in a satisfactory way. To judge performance, designers use the following metrics:
- Echo return loss (ERL): Measured in dB, ERL is the ratio of receive-out and send-in power. ERL measures receive-out signal loss when it is reflected back as echo within the send-in signal. For line EC's the ERL of the echo path should be above 6dB according to the International Telecommunications Union's (ITU's) specifications. For acoustic ECs, the ERL could be as bad as --12dB. The infinite ERL would indicate digital termination or on-hook condition.
- Echo return loss enhancement (ERLE): Measured in dB, ERLE is the ratio of send-in power and the power of a residual error signal (E) immediately after the cancellation. ERLE measures the amount of loss introduced by the adaptive filter alone.
- Combined Loss (ACOM): Measured in dB, ACOM is the ratio of receive-out and send-out power. ACOM measures the total signal loss due to network conditions, the adaptive filter, as well as the NLP action.
All of the above parameters depend on frequency. That is, often network hybrids do not have a flat frequency response. For example, VoP systems may run into different ERLs when exposed to low or high frequencies. In the case of acoustic echo, the speaker and microphone frequency characteristics substantially influence the overall spectral properties of the echo return loss.
When reading marketing literature, one has to be very careful if the EC performance is specified in terms of ERLE. The ERLE will depend on the ERL during the measurements as well as on the frequency characteristics of a hybrid or acoustic echo path as well as the type of signal used in measurements. The following facts about ERLE should be known:
- The ERLE will be limited by ERL and SNR.
- The ERLE may vary a lot based on ERL and SNR.
- The ERLE can be inflated using specific test signals (noise, sine waves, composite source signal).
A more appropriate measure of performance would be the combined loss with the disabled NLP. This measure should be more or less constant over wide range of echo paths. However, it would still depend on the SNR. The SNR in general would limit how deep convergence can be achieved.
The EC's may not be able to achieve convergence that is much deeper than the background noise within the send-in signal. If the residual echo after cancellation is embedded within the background noise, most likely it would not be observable. Hence, the existence of such echo residuals would not be relevant.
As stated above, adaptive filtering is a key ingredient for fighting echo in a communication architecture. Adaptive filter implementations can be classified in several ways. These include time and/or frequency domain adaptation (including filter bank/wavelet implementations); type of adaptive algorithm (least mean square [LMS] or recursive least square [RLS] based); filter structure (FIR, IIR, lattice, etc.).
A typical line EC would use time domain adaptation based on normalized LMS algorithm with an FIR filter structure. A typical acoustic EC would use frequency domain adaptation based on normalized LMS with an FIR filter structure.
More complex EC systems may use multi-filter implementation as well as echo-path segmentation. The multi-filter implementation uses one filter for echo removal and a separate filter of exactly the same structure for adaptation. In this way temporary adaptation mistakes or errors may not be observable to the users. As a consequence the EC system would be more robust in the presence of double-talk and near-end noise. Somewhat slower initial convergence might result, but may not be perceptually relevant.
The echo path segmentation can be used for long tail line EC's due to the physical properties of long echo paths. Since the majority of long echo paths would only contain a few separate hybrid reflections, one may attempt to segment the echo path into portions that contain hybrid reflections and portions that only contribute to delay resulting in zero filter coefficients. In this way, both memory and processing requirements would be greatly reduced. Moreover, one can show that the resulting speed and depth of convergence could provide a better tradeoff compared to a full filter approach (no segmentation case).
The multi-segment (MSEG) implementation is not suitable for acoustic EC's due to the physical properties of acoustic echo paths. However, some acoustic EC implementations may use variable tail length to self-adjust to different room sizes. The MSEG implementations may suffer from slower initial convergence or slower reconvergence due to the dynamic echo path changes. The convergence on narrowband signals (e.g. DTMF) may also be affected.
Dealing with acoustics
Now that we've laid out some of the basic echo cancellation properties, let's take a more in-depth look at acoustic and line echo cancellation techniques. In this part, we'll focus on acoustic echo cancellers while in part 2 we'll examine line echo cancellers.
Acoustic echo canceling (AEC) in VoP is used primarily within the IP phone implementation with handsfree operation. The most common AEC system requirements are:
- Adaptive filter length: 60 to 200 ms.
- Full duplex performance.
- Bi-directional NLP.
- Enclosure and handset design according to TIA or ETSI specifications.
- AEC performance according to ITU specifications.
- Multi-port analog front end (handset, handsfree, headset, etc.)
- Optional automatic gain control (AGC).
- High-quality A/D and D/A (should have better than 80dB SNR).
- Proper handling of background noise in handsfree operation (noise blocking for low-end solutions or comfort noise generation for high-end solutions).
The above requirements would allow for IP Phone usage in regular offices as well as small conference rooms. The larger conference rooms may not result in full duplex performance or may even require longer adaptive filter.
AEC Design Guidelines
The adaptive filter for AEC should be frequency based, e.g. using a Fast Fourier transform (FFT). The following are some of the reasons for using FFT:
- Very long tail length that has to be fully covered (no possibility for segmentation).
- Computational complexity reduction.
- Ability to independently control convergence over different frequency bands.
- More efficient double-talk detection (very important for handsfree operation).
- Better tracking properties for speech signals, faster reconvergence.
- Spectral matching of background noise can be achieved with less trouble.
In addition to reduction in numerical operations the frequency-based approach provides for better handling of double-talk in handsfree operation. This is very important since in handsfree operation the acoustic ERL can easily be as low as --12dB. As seen below, proper enclosure design can greatly improve chances for AEC to perform well under such conditions.
The NLP is often implemented in both send and receive directions. This is important for proper handling of handsfree operation. Also, in situations when the enclosure and hardware design are sub optimal the bi-directional NLP implementation would be able to gracefully degrade performance toward half-duplex action.
The mechanical phone enclosure design and the electrical analog front-end (AFE) properties play very important roles when designing a phone that will provide handsfree operation. The AEC will not function properly unless the enclosure and analog front-end are implemented properly. The enclosure design should take care of the following parameters:
- Distance between microphone and speaker should be as large as possible.
- The microphone should be mounted so that it does not face any obstacles when the phone is placed on a flat surface. It should also be oriented in a different direction from the speaker.
- The microphone and speaker should have some acoustic isolation within the phone enclosure.
- The microphone should be enclosed within foam to prevent it from getting in direct contact with the enclosure body.
- The microphone should be placed away from the handset cradle to minimize the noise when placing the handset back into cradle upon switching to handsfree operation.
- If the speaker is placed within the handset cradle the sound should be directed at a different angle to minimize possibility for handset to vibrate at higher speaker volumes. Some foam may need to be glued at the cradle to further minimize handset vibrations at high speaker volumes.
- The area of the places where the phone touches the table should be minimized. Felt or a similar vibration absorbing material could be placed at the places of contact to minimize vibration. The mechanical vibrations from a keypad as well as any other vibrations should be minimized.
- Based on the frequency response of a selected speaker, one may need to provide additional back volume in order to improve the frequency characteristic. The way a speaker is mounted to the chassis as well as the quality of workmanship may also influence the overall acoustic performance.
The AFE (ADCs, DACs, and programmable gain amplifier [PGA]) must be high quality parts. The PGA implementations that use a combination of analog and digital gains should be avoided since they can result in nonlinear distortion due to saturation or quantization. Otherwise, care should be taken that the microphone analog and digital gains change in a synchronized fashion so to avoid audible clicks and pops if the AGC is used together with the AEC. The SNR of the AFE should be at least 80 dB within the audible range of frequencies for proper AEC operation over wide range of signal levels.
One should note that a bad enclosure design could not be improved within the AEC algorithm. The true full-duplex performance can only be achieved by AEC in rare cases where the exceptionally good enclosure design has been combined with a high quality analog front-end and spotless workmanship in phone production. A single missing screw or stiff foam around microphone may force an AEC to lose full-duplex capability, i.e. to limit the depth or speed of convergence.
To solve problems caused by an enclosure, designers can implement a set of optional features. These include automatic gain control, automatic level control, noise reduction, speaker equalization, and comfort noise generation (CNG). However, if a higher quality enclosure design and analog-front end are employed, many of these optional features may be rendered unnecessary.
The single most important feature when evaluating AEC is full-duplex performance. Although the enclosure and AFE design represent the first step toward full-duplex operation there are several conditions that can act as an obstacle in achieving good performance. These include background noise, unbalanced speech levels, double talk, and echo path changes.
Sometimes background noise at send-in is so high that the people who talk far away from the microphone are hard to hear. This may result in a wrong NLP action that eliminates the speech embedded within the noise. The convergence of an adaptive filter may be slow or unstable in presence of high levels of noise or when the noise is not stationary.
When double talk occurs, providing echo free signal will be hard unless the adaptive filter has reached deep convergence in steady state. When unbalanced speech levels are encountered, the double-talk and NLP logic will have hard time making correct decisions when the speech levels in send and receive directions are out of balance, for example more than 6 dB apart.
Echo path changes can occur whenever somebody moves inside the room, or passes a piece of paper across the table, or enters or exits the room. This causes problems because the AEC has to reconverge fast in such situations so that the echo path changes remain transparent to the listener at the other end.
Clearly, the full-duplex performance will be hard to accomplish unless the AEC can provide stable and steady convergence under variety of environmental conditions.
Getting Ready for Line Echo
That wraps up Part 1 of our VoP echo cancellation discussion. In part 2, we'll take a closer look at line echo cancellation. Until then, if you have any questions, send me an e-mail!
Editor's Note: To view Part 2, Click Here
About the Author
Bogdan Kosanovic is the manager of echo cancellation technology at Telogy Networks, a Texas Instruments Company. He received a Dipl.Eng. degree in electrical engineering from the University of Belgrade, Serbia, and M.S. and Ph.D. degrees in electrical engineering from the University of Pittsburgh. Bogdan can be reached at firstname.lastname@example.org.