Design Article

IMG1

Beyond QoS: Voice Quality Management for VoIP Networks

Nathan Chandler, Ditech Networks

7/16/2007 3:46 PM EDT

Network managers generally focus on quality of service (QoS) as the only means of monitoring and managing the quality of IP services. But while managing packets for loss, jitter, or delay is important for VoIP networks, today's successful VoIP providers must go farther; they must ensure a high-quality customer experience by monitoring and managing voice quality levels. In fact, the quality of voice reproduction and transmission can often make or break customer satisfaction and service renewals. In this article, we'll look at IP voice quality challenges along with a new class of solutions for this challenge.

Why Standard QoS Isn't Sufficient for Voice
Service providers typically use three techniques to control and enhance QoS in packet-based networks: class of service (CoS), differentiated services (DiffServ), and Multi-Protocol Label Switching (MPLS). All three have one thing in common: they all attempt to minimize packet loss and jitter of the voice transport. While this is one piece of the puzzle, controlling packet loss and jitter does not prevent other impairments from degrading the voice.

CoS (IEEE 802.1p) is the usual means for providing QoS in Ethernet networks. With CoS, all packets are assigned a priority, and VoIP traffic generally receives the highest priority; thus, using CoS prioritization can minimize VoIP traffic packet loss.

DiffServ is a class-based IP QoS technique specified by the Internet Engineering Task Force (IETF). The DiffServ Control Point contained in the IP header is used to control the Per-Hop Behavior (PHB) of routers along the traffic's path from end to end. With VoIP traffic, the PHB is normally set for "Expedited Forwarding" to minimize latency and packet loss.

MPLS adds a separate 32-bit header to each packet which is used to create virtual label switched paths (LSPs). With LSPs, providers can segment, prioritize, and expedite traffic. Originally created as a means to implement virtual private networks (VPNs), the MPLS header also has a 3-bit QoS field that can be used to minimize packet loss and latency with VoIP traffic.

Measuring IP Voice Quality
While IP QoS techniques are useful in any network, VoIP requires more detailed voice quality management. The first challenge is to quantify voice quality so it can be monitored and managed.

In the public switched telephone network (PSTN), the mean opinion score (MOS) remains the benchmark for determining voice quality. Although the subjective nature of MOS testing is practical in the stable and predictable PSTN, the dynamic and "bursty" nature of IP traffic requires a different approach. For this reason, newer, objective techniques that can be automated are now available to calculate voice quality ratings based on measured values. Of these techniques, the E-Model specified in ITU-T Recommendation G.107 is the least intrusive and the most cost-effective. By measuring various parameters of the network and the speech, the E-Model estimates how the average user would likely rate a call on an equivalent MOS scale (see Table 1).


Table 1. Mapping E-Model R-Factors to MOS Ratings

The R-Factor is calculated based on impairments such as a call's signal-to-noise ratio (SNR), level mismatch and codec distortion, talker and listener echo, and codec tolerance for packet loss. A detailed analysis of the R-Factor can be used to help isolate the root cause of persistent problems.

Customers have two sets of expectations regarding voice quality. In the PSTN, the expectation is for "toll quality" voice with a MOS rating of 4.0 or more. In the cellular phone network, the expectation is considerably lower (around MOS 3.2"3.5). Of course, cellular customers willingly tolerate this "fair" voice quality for the convenience of mobility.

There are three reasons why it's so difficult to achieve a MOS of 4.0 or more in an IP network. First is the use of low bit-rate encoding in many VoIP networks as a way of minimizing bandwidth demand. For commonly-used, low bit-rate codecs such as G.729a (8 kbps) the highest possible MOS score is only 3.7, even under the best of network conditions. Additionally, packet loss dramatically affects the speech quality of the low bit rate codecs. In contrast, the G.711 codec (64 kbps) employed in the PSTN can tolerate packet loss very well and has a starting MOS score above 4.0.

The second reason is that IP voice quality inevitably deteriorates as the number of subscribers grows and/or the traffic load increases. Adding capacity mitigates the problem, of course, but the bursty nature of IP traffic makes it inevitable that the network will experience periods of congestion so that despite QoS implementation, users experience diminished voice quality from packet loss, delay and jitter.

The third reason is that IP voice quality is frequently impaired by noise, distortion, echo, and mismatched volume levels. Many IP networks, by design, simply lack the provisions to eliminate these impairments.

Voice Quality Impairments
Now, let's look at IP voice quality impairments in more detail. Figure 1 depicts a typical end-to-end VoIP network. Each network segment (from the subscriber premises through the core and beyond) is assumed to offer some form of QoS.


Click for larger image


1. Impairments to Voice Quality Occur in the Subscriber, the Access, and the Peering Network Segments.

As the diagram shows, the subscriber and access networks introduce five separate impairments: acoustic echo, ambient or background noise, audio level mismatch, hybrid echo, and codec distortion.

Acoustic Echo: Acoustic echo is caused by poor acoustic isolation between the microphone and speaker in user devices (handsets, headsets, speakerphones, IP softphones, etc.). Acoustic echo becomes more problematic and noticeable with VoIP-induced packet delay. This problem is quite common in VoIP networks. An estimated 10 to15 percent of all calls suffer from this impairment, potentially in quite annoying ways.

Ambient or Background Noise: Noise is present anywhere people live or work. Noise can also be introduced by VoIP and packet processing systems in the form of static, hum, and "popping" sounds. In most VoIP networks, the undesirable noise is simply incorporated right into the encoded packets along with the voice signal. Conversely, eliminating all noise can also be a problem. Users can perceive that the line has gone dead, prompting the familiar question: "Are you still there?" To address this issue, some systems contain comfort noise generation (CNG) to produce the familiar low-level hiss of a PSTN line. Of course, comfort noise must counterbalance ambient background noise to provide the proper customer experience.

Audio Level Mismatch: The volume levels of calls between two VoIP endpoints are often unbalanced, with one side of the call louder or quieter than the other. Different users have phones and phone equipment from manufacturers who offer differing microphone sensitivity levels. Users may be able to compensate somewhat for this impairment by adjusting volume settings, but such inconveniences shouldn't become a constant when delivering higher quality services.

Hybrid Echo: Hybrid or line echo is an electrical signal reflection that occurs at the two-wire to four-wire conversion in the analog tail circuit at the edge of the PSTN. Although hybrid echo is not generated in a pure VoIP network, most VoIP calls still originate from or terminate to the PSTN or cellular networks. The additional transmission delays encountered in a VoIP network worsen the effect of hybrid echo, causing it to become dissatisfying. And while some media gateways now include a hybrid echo cancellation feature, this typically addresses a relatively short hybrid echo tail circuit, not the much longer ones seen in most large VoIP networks.

Codec Distortion: Encoding and decoding VoIP calls using low bit-rate codecs reduces the sharpness of speech and can lead to poor voice intelligibility. With an increasing number of IP borders and multiple transcoding points between callers, providers must compensate for this impairment to prevent severe negative consequences for user satisfaction.

Despite the use of IP QoS or MPLS, there is also, inevitably, some packet loss, delay and jitter introduced in all IP network segments. Most codec standards have some provision (i.e., packet loss concealment, or PLC algorithms) to accommodate for packet loss. However, with very low bit-rate codecs (and even with built-in PLC algorithms), packet loss rates of less than one percent are normally perceptible, and often annoying.

Jitter, or variations in delay, can further increase distortion by making voice sound garbled. For this reason, most VoIP systems incorporate de-jitter buffering to eliminate the variations. Unfortunately, jitter buffers impose additional delay to remove these variations.

Individually, these impairments lessen voice quality, perhaps substantially. Collectively, these impairments can make VoIP voice quality unacceptable.

IP Voice Quality Solutions
A new generation of voice processing platforms has emerged to address the challenges of ensuring high VoIP call quality in any IP or MPLS network infrastructure (Figure 2.) These platforms employ as many as six separate technologies:

Acoustic Echo Control solves the acoustic echo problem found in most VoIP networks. Some platforms include a bi-directional acoustic echo control feature that suppresses echo variances using algorithms based on talker energy levels and weighted acoustic echo path loss.

Adaptive Noise Cancellation technology uses a high-precision noise reduction algorithm, which helps mitigate ambient or background noise impairments by using various techniques to remove the noise components of a call without reducing talker volume.

Automatic Level Control technology dynamically detects level imbalances and automatically adds gain or attenuates as much as needed to bring both sides of the call to the same specified volume. Automatic level control also prevents clipping and codec distortions, and can compensate for background noise by improving the signal-to-noise ratio.

Hybrid Echo Cancellation eliminates hybrid echo for VoIP calls that traverse a PSTN or cellular hybrid network. The most advanced voice processing platforms can cancel echo with an Echo Return Loss of up to 0dB, compensating for network tail delays of up to 278ms, and enabling fast, stable convergence in less than 50ms.

Packet Loss Concealment is intended to correct the audio stream in the presence of packet loss and excessive jitter. It can work by reconstructing missing packets within a VoIP packet stream. Advanced platforms use a predictive speech model to reconstruct a missing packet's voice payload and rebuild it upon packet play-out. This enhances the ability to support high-quality VoIP, even in congested IP networks experiencing substantial packet loss.

Enhanced Voice Intelligibility improves the quality of speech by correcting for distortions introduced by multiple low bitrate encodings to provide increased clarity and speech recognition.


Click for larger image

2. Voice processing platforms can remove all voice quality impairments.

Advantages of Voice Processing
Economically speaking, adding a voice processing platform to a VoIP infrastructure pays for itself in numerous ways. PSTN-like voice quality increases customer satisfaction, which reduces costly churn and attracts new customers. In addition, consistently high voice quality (around MOS 4.0) minimizes calls to customer service, which previously could do little or nothing to actually help the caller.

The ability to deliver satisfactory voice quality using low bit-rate codecs enables carriers to get the most from their entire IP infrastructure investment, including existing first-generation media gateways. This capability also eliminates the wasteful practice of under-subscribing the network, and it helps to postpone potentially costly and disruptive network upgrades. In fact, many carriers are able to cost-justify the relatively modest investment in a voice processing platform based on this capability alone.

Although some planning is required to improve voice quality in any IP network, the implementation effort is itself made substantially easier (and far less expensive) with the availability of purpose-built voice processing systems. And unless all of the impairments to voice quality are successfully removed, no amount of IP QoS provisioning in a VoIP network can ever hope to achieve the PSTN's MOS rating of 4.0 or more. As VoIP services become more widely available, voice quality will emerge as the major competitive differentiator among providers, and voice processing systems will be the key.

About the Author
Nathan Chandler is Product Marketing Manager (VoIP) for Ditech Networks, Mountain View, CA.

Related Articles
Tip: Wideband vs. narrowband VoIP codecs
Enable VoIP Quality of Service from the Design Platform
Next-generation VoIP and the role of DSP
Achieve High Availability in VoIP: An Implementation Example--Part I
Using CESoP to implement voice over an Ethernet passive optical network


print

email

rss

Bookmark and Share

Joinpost comment




Please sign in to post comment

Navigate to related information

Product Parts Search

Enter part number or keyword
PartsSearch

FeedbackForm