Its such a simple idea, and its taken so long to reach fruition: Use natural-language, speaker-independent speech recognition to handle call routing. Getting an incoming call from a general company number to a specific extension takes some intelligence, whether from human or machine. A receptionist/operator puts flexible, adaptable intelligence in the loop (Editors Note:At least
bright, insightful attendants do, but
could use a course in charm school), but its at a high cost $20-$50,000 a year or more to keep the answering system adequately staffed. Switching to a conventional auto attendant saves money, but forces callers to listen to menus, make touch-tone selections, or agonize over the annoyingly-non-standard spell by digit feature. The net result: Many callers (perhaps your most important, hence impatient, customers)
end up zeroing out to a fallback human. You save nothing, and earn a reputation for bad customer service.
Speech recognition, both the raw technology itself and the cleverness with which speech-based user interfaces can be engineered, provides a better indeed, a near-perfect solution. Automatic speech recognition (ASR) has finally been put to work to bridge the gap between the naturalness of speaking a name to a human receptionist and the cost savings and reliability of
an auto attendant. A good ASR auto attendant can be on duty 24/7/365, handling one or a dozen incoming lines with ease always patient, always doing its best to get the caller to the right extension for a fraction of the cost of human staffing. It offers easy-to-use service to callers who might be unable to carefully spell out a name on the touch-tone keys (cell phone callers sitting at the wheel of a car, children, physically-, visually- or simply keypad-challenged people). Even unimpaired
adult users sitting at a desk appreciate the ease and speed of simply speaking a name.
Yes, ASR auto attendants have a place waiting for them. But do they work well enough for companies to put them into place? A year ago (February 1999) we held our first test of this emerging technology. Now were back for another, even more detailed look. This time, three manufacturers stepped up to bat. One (Registry Magic) is a repeat from our last shootout but has a new version for us to test this time. Another
(Philips Voice Request, formerly VCS) was in our last test as an independent company but has since been acquired by Philips and also has a new version of their product. The final contender (Locus Dialogue) has been around a while but is new to this test.
Reader and manufacturer responses to last years test brought to light some interesting issues regarding methodology. CT Labs worked hard to re-engineer certain tests, insuring a level playing field. Theyve also extended the test suite to
evaluate some of the more subtle ways makers add value to these complex devices.
The Testing Process
The three participating systems were put through identical test sets that evaluated their basic ability to recognize a spoken name and transfer the caller to the correct extension. All of the systems had additional features and capabilities that were not tested, since we wanted to concentrate on the core function of speech-based call handling. The Features Table on page 78 comprehensively lists
all the systems features.
The six tests performed were:
Single-line and four-simultaneous-line name recognition, in barge-in mode (an automated test).
Single-line and four-simultaneous-line name recognition, in prompted-response mode (an automated test).
Handling of duplicate names in the directory (a single-line manual test).
Handling of names not in the directory (a single-line manual test).
All test results are shown in the Test Results table on page 84.
The Name-Recognition Tests
The core of the testing was to evaluate the ability of each system to recognize spoken names and connect the caller to the correct extension associated with each name. For these tests, each system was loaded with an identical directory of 400 non-duplicate names the same set we used for last years tests.
Last year, though, we used a single male speaker who recorded 50 of the names in the directory. Some manufacturers objected to this approach, reasoning
that a given system may have scored better or worse based on minor aspects of this specific speakers voice. To answer this concern, we generated a broader sample: This time, we used seven speakers, five male and two female, each recording 35 names from the directory for a total of 245 individual spoken test prompts. None of the speakers had noticeable regional or foreign accents, and all were asked to enunciate the spoken names clearly. Each of the names were spoken without extraneous phrases or words
(e.g., Give me..., Put me through to...).
The prompts were recorded in 16-bit 128 Kbps PCM, using a good-quality electret microphone, and were level-equalized to within 1 dB for each speaker. Playing these prompts through Teltone line simulators restricted the frequency bandwidth exactly as if it had been passed through any telephone network. Overall, the test parameters were chosen to be best case, without any attempt to fool the systems under testing or evaluate their
For each of the single-line tests, the Hammer test call generator placed 245 individual calls into one port of the system under test via the Teltone line simulator. For each of the four-line tests, a 245-call sequence was placed into each system port simultaneously, 980 calls in all. Each test was performed once with the system in barge-in mode, and once with the system in prompted response mode. (See sidebar.)
Altogether, a total of 2,450 test
calls were placed into each system. In barge-in mode, the Hammer system waited for several seconds after the call was picked up by the system, and then spoke the prompt name. In prompted-response mode, the prompt was spoken after the auto attendant systems beep or tone, or when the opening prompt had finished playing. In each case, there were three possible responses from the system:
The system considered its selection a high-probability match and transferred the call to the associated
The system was not as confident of its recognition and asked the simulated caller if (for example) Gil Lamont was the intended extension. The Hammer system was set to respond yes to all such inquiries.
The system failed to find an adequate match in its directory and transferred the caller to the operator.
The Hammer system evaluated the transfer accuracy by capturing the dialed transfer DTMF digits from the system and matching it to the prompt database. If the
correct extension was dialed, it was scored as a correct recognition attempt. If the wrong extension was returned, again with or without a prompt, it was scored as a false recognition error. If the call was transferred to the operator, it was scored as a no-recognition error.