Traditionally, echo cancellation has been seen as black magic, requiring a great deal of heuristic know-how and expertise to achieve acceptable practical results. However, through the development/implementation of a new adaptive filter scheme, echo cancellation can become a much easier task for today's design engineers.
In this article, we'll layout the new adaptive filtering scheme, looking at issues such as online and offline filtering as well as algorithm development. We'll also provide practical design examples for how the proposed filtering approach can improve the design of echo cancellation architectures for voice-over-IP (VoIP), voice-over-packet (VoP), and other derived voice applications.
The Central Character
The central part of any echo canceller is the adaptive filter (AF). This filter builds a mathematical representation, or impulse response, of the echo path (Figure 1). Once this representation is built, it is stored in what is typically known as an H register.
When a signal (Rin) is processed by the AF using an H register, the output is a new signal approximating the expected echo of the Rin signal. This new signal can then be subtracted from the return signal (Sin) to remove or cancel the echo.
Figure 1: Adaptive filters are an integral ingredient in the echo canceller architecture, building a mathematical representation of the echo path
Since the adaptive filter can only approximate the echo, it is not capable of removing the echo entirely and some residual echo is left in the signal. The accuracy of the H register at any given moment will determine the amount of residual echo. If this amount becomes too great it is the job of the non-linear processor (NLP) to remove it.
AF: A Diverse Field
The field of adaptive filtering is vast and thus there are a diverse amount of AF approaches designers can implement in an echo canceller design. Despite this diversity, the objective of all AF solutions is the same: find the best mathematical model that will minimize the error between the actual response of the system and the mathematical representation. This process is called convergence.
AF algorithms in voice echo cancellation devices compete in two types of races: sprints and marathons. Sprints involve converging quickly at the start of a call and re-converging quickly following a change in the echo path. Initial convergence must be accomplished quickly since the echo path is not known at this time. Following this sprint, an AF algorithm must continue improving convergence despite any noise present in the signal containing the echo. This marathon will continue during the entire call, passing through periods of silence and double talk. Convergence on the echo path must never be lost during this phase.
In short, the challenge is to simultaneously achieve two conflicting characteristics of adaptive filter design, fast convergence and high stability
Online vs. Offline Filters
The simplest forms of AFs run continuously on the input signals. For each new sample the adaptive filter updates the H register. The new sample is assigned a weight which will determine its importance versus the data already contained in the H register. This weight is often called adaptation gain, step size or forgetting factor.
At the beginning of a call or after an echo path change, adaptation gain should be large in order to bring the H register to resemble the current echo path rapidly. When double talk is present, a very low adaptation gain (sometimes a gain of zero) is required to prevent the corruption of a valid H register. In all other situations, a low gain is applied to converge slowly, since averaging a large amount of samples yields a more exact approximation of the echo path in the H register.
The logic used to determine the adaptation gain controls the stability and responsiveness of the AF. This logic is where most of the black magic in an AF resides.
One of the greatest problems with developing an algorithm to control adaptation gain lies in differentiating double talk from an echo path change. If double talk is mistaken for an echo path change, the high adaptation gain applied to the new samples will allow the double talk to corrupt a previously well-converged H register. However, if an echo path change is mistaken for double talk, the low gain applied on new samples will prevent the H register from characterizing the new impulse response in a reasonable amount of time.
To circumvent that problem, an echo canceller algorithm could spin off a second AF, one we would call an offline filter, while maintaining the current online H register intact. The offline filter will attempt to converge on the most recent samples, creating an offline H register with rapid convergence. If the offline H register reaches a state where it matches the echo path better than the H register held in the online filter, then it will replace it. Selecting the right H register to apply online is the key to a fast and stable offline filtering system.
Selecting the right H filter is often called offline filter selection. In this selection process, an algorithm compares two H registers. The algorithm uses an H error register to establish each filter's convergence allowing an enlightened comparison. This error metric, combined with basic statistical methods, is the key to obtaining an adaptive filter behavior that is stable, immune to double talk, and that converges rapidly.
Simplified Echo Canceller Example
To illustrate clearly the decision-making process in offline filter selection, a simplified echo canceller will be used. This echo canceller contains a single tap H register at delay 0, meaning that the echo canceller can only measure attenuation but cannot look at delay. This device will calculate an offline H register every 1024 samples (128ms), using the 1024 most recent samples. Once the offline H register is calculated with all samples included, it may replace the online H register (if it is deemed to be a better model of the echo path) or be discarded. The echo canceller will use the least squares algorithm to calculate its H register. Note the least squares algorithm is defined as:
Figure 2 shows a test call that lasts 3072 samples. For the purpose of the example, the echo path is a single tap impulse with an amplitude of 0.5 for the first 2048 samples. The last 1024 samples contain a different echo path, whose amplitude is 0.2. Thus we have an echo path change at sample 2048.
Figure 2: Simplified echo canceller performing a test call that lasts 3072 samples.
By looking at Figure 2 one can quickly conclude that the first convergence period (P0) should contain the least amount of error in its H register, since it has the least amount of noise present in the Sin signal. A similar reasoning shows that the center convergence period (P1) should contain the most error in its H register since it has the largest amount of noise present. The last period (P2) should fall between the two others.
At first glance, a good adaptive filter would select the H register produced during period P0 as soon as it is available (at sample 1024), and immediately apply it online. At the same time, the H register produced by P1 should be discarded due to the large error from the noise in P1. Finally, the H register from P2 is significantly different in amplitude from online H register (that has been applied since the end of P0), so it should replace the online H register to reflect the echo path change. We'll go into greater detail on all of these processes below.
Finding the Error
Adaptive filters differentiate echo from noise by measuring the correlation between the Rin and Sin signals (see to Figure 1). The Sin signal is the sum of the near end signal (background noise and near end talker) and the echo signal. Finding the echo in the Sin signal requires a cross correlation with the Rin, because this echo is a linear function of the Rin. While the Rin signal will correlate perfectly with the echo, it will have much less correlation with the near end signal. Although the near-end signal (Snear) and Rin are independent (generally not correlated), the correlation will usually be non-zero. It is this correlation that causes the error in the H register.
To justify the above adaptive filter behavior highlighted in Figure 2, several mathematical milestones must be reached. First, the error of the H register after each convergence must be estimated. It is not possible to calculate exactly the H register error. If it were, it would be possible to find the exact echo path by subtracting the error from the estimated H register value.
It is however mathematically possible to calculate the standard deviation of the error. The standard deviation value provides a degree of confidence on the statistical spread of the error around the estimated H value. The standard deviation of the H register's error will be stored in the H error register (see Figure 2 above).
Many statistical methods exist to determine the standard deviation of a random process. In Figure 2, it is possible to demonstrate that the error is of Gaussian distribution. Using Gaussian statistics and knowing the standard deviation of the error, it is possible to calculate the odds of an error exceeding a certain value, usually measured in multiples of the standard deviation. Table 1 shows various standard deviations as well the theoretical percentage of experiments that will produce an error greater than those values.
The H error register is used to determine how erroneous the value in the H register really is. As long as the H register's real error is less than the anticipated error, the decision made by the algorithm will be correct. If the real error however exceeds the value that is anticipated, incorrect decisions may be made.
In the echo cancellation example above, a decision is made every 1024 samples (i.e. each time a new H register is available). If a 4σ (4 times the standard deviation) error factor is used in the decision process, 1 decision out of 15500 will be wrong on average (0.0064% error rate). Using an 8 kHz sampling rate, one wrong decision will be made every 33 minutes of operation. If a 6σ error factor is applied instead, one wrong decision will be made every 2 years of operation.
Selecting the Right Filter
The offline filter selection process can now make use of the amount of potential error (H error register) with a predictable accuracy. Every 1024 samples, a new H register value is calculated and is compared with the online H register value taking into consideration the respective H error registers to determine if the echo path has changed.
To run the comparison, designers perform a consistency test to calculate the difference between the two H registers. If the difference in value between the two H registers is too large to be explained by their errors, then the new H register automatically becomes the new online H register, eliminating all information gathered based on the old echo path model. This is typically caused by an echo path change. If the two H registers are found to be consistent (the difference is within the range of error), then the H register with the lowest potential error is kept as the new online H register.
After period P0, a new H register is calculated. To determine if it is appropriate to apply it online, it must represent a non-zero echo path. In other words, the H register magnitude must be greater than the H error register times the error factor for the echo path to be significant. Using a 4σ error factor in our previous example, the above condition is clearly met [0.54 > (0.02 x 4)], so the new H register can be applied online. It is important to make this verification, because an H register that was calculated over a period with a considerable amount of double talk may introduce so much error that it would amplify the returning echo, which would be worse than doing nothing.
After convergence period P1, the new H register is compared with the online H register to determine their consistency. The difference in values of the H registers is compared to the sum of their errors. The difference between the online and offline register must be bigger than both errors taken together to detect an echo path change. Using a 4σ error, the threshold condition is not met in this case [(0.54 - 0.4) > (0.15 x 4)]. Therefore, the two H registers are consistent.
The second test is to check which one of the H registers has the lowest H error register; typically, this will be the H register that best models the echo path. Clearly, the online H register (from P0) is a better line model and is kept.
Finally, when the H register calculated over period P2 is known, the same echo path change test described above will be performed. In this case, the difference between the two is larger than the error so an echo path change is detected [(0.54 - 0.22) > (0.07 x 4)]. The new H register replaces the online H register, despite the fact that the newly calculated H error register is higher than the online one.
Extending the Model
The mathematical tools explained for the simplified echo canceller above must now be extended to a more practical implementation. The main difference is that echo tails are usually longer than a single tap, which means they also have a delay component. The notions above will be reapplied to a 1024-tap (128-ms active tail) echo canceller.
It is not trivial to extend the H error register. Rather than characterizing an amplitude error on a single tap, the H error register can be made to characterize the error on a single frequency band, placing the H error register in the frequency domain.
To illustrate this clearly, Figure 3 shows a typical echo path impulse response in the time domain. This is what the H register contains. Using a Fast Fourier transform (FFT), it is possible to convert the H register from the time domain to the frequency domain. This transformation produces 512 independent frequency bands for a 1024-tap H register, each of which has a magnitude and phase component (Figures 4 and 5).
Figure 3: Typical time domain echo path impulse response.
Figure 4: Typical frequency domain echo path impulse response.
Figure 5: Complex echo path impulse response with phase for 500Hz.
A single frequency band centered on 500 Hz was extracted from Figure 4 and redrawn in Figure 5 including the phase component. The length of the vector in Figure 5 matches exactly the magnitude in Figure 4. The angle of the vector from the origin in Figure 5 represents the phase component. The phase of the impulse response is important because two H registers with an equal magnitude but having different phases must represent different echo paths. For example, the same hybrid at different delays will produce exactly the same magnitude but different phases.
The H error register can now be associated with the H register in the frequency domain. In the simplified echo canceller, the H error was an unsigned magnitude that could represent a positive or negative error in the H register. Rather than being confined to moving the H register value up or down, the H error in the frequency domain can move the tip of the H register vector in all directions. This is represented by the gray error circle in Figure 5. The H error in the frequency domain can change both the magnitude and the phase of a band. The radius of the error circle is defined by the multiplication of the H error register (i.e. 1 standard deviation) by the error factor. The larger the error factor, the larger the circle will be. The probability of the real echo path impulse response being outside the error circle is listed in Table 1.
The complex 1024-tap H register problem has now been broken down into 512 smaller, more manageable problems. Each of the frequency bands can be managed like the simple echo canceller described above.
Accelerating Multiband Decisions
As we apply the selection techniques on the independent frequency band adaptive filters, some line events occur which affect all 512 frequency bands simultaneously. However, because real signals have varying spectral distribution, some bands may never get any significant energy signal in Rin, or may get a large amount of noise in Sin. Applying the described algorithm to these bands alone may not result in a statistically valid H register decision in a reasonable time. Conversely, other bands will be able to establish a very clear picture of the line conditions quickly.
Extending the same statistical methods to combine selection information across bands allows the detection of echo path changes to occur more rapidly and accurately than otherwise possible.
For example, an echo path change decision may need to be taken with a 5σ error factor in a single band (to minimize the odds of detecting spurious echo path changes). However, even if no single band detects the echo path change with this level of certainty, it can be shown that 4 bands detecting the echo path change with a 4σ error factor yields at least the same level of confidence in the decision. A different error factor may be applied to take a decision using any number of bands.
About the Author
Frédéric Bourget is a senior product manager at Octasic. He received his B. Eng (Physics) from Laval University in Quebec and can be reached at Frederic.firstname.lastname@example.org.