The home of today is alive with sound. From alerts issued by home security systems to voice memos on the family refrigerator, the home appliance market continues to seek new low cost, innovative voice solutions for their products. Voice solutions, with their relatively low cost and ease of integration, hold the promise of making appliances easier to use and more interactive, and presenting a more intuitive and comfortable experience for the consumer. In addition, voice solutions can help manufacturers differentiate products in an increasingly crowded market. This article explores various design options for adding voice functionality to home appliance and white goods. It also discusses the relative trade-offs for each technology covered.
An important consideration in designing an appliance or household electronic product is how the product is expected to interact with the user. This usually involves a trade-off between cost, convenience, durability and other factors. A high-end solution might entail a full LCD screen that displays text, icons and pictures. A much less complex solution might encompass blinking LED indicators and/or beeping enunciators. In between lie many opportunities for speech products.
For a limited set of responses, or for a response in a familiar voice, voice output devices are an excellent choice. Typically, these types of ICs contain between 10 seconds to 8 minutes of recorded messages, prompts or instructions. For larger amounts of storage, or where the message might be variable, it may be more practical to use more sophisticated Text-to-Speech ICs, which can provide a speech output from ASCII text output.
There are a number of advantages to using a speech interface. The user need not be next to the appliance to get information otherwise presented in an LCD display or LED indicator. Speech functionality can also allow remote operation of the appliance through the telephone network. In addition, speech enables a product to be used by both the sighted and the visually handicapped without modification.
Potential household appliances suited for voice output include refrigerators and freezers, talking smoke detectors and home security devices, Caller ID telephones and answering machines, VCR/ DVDs , washing machines and dryers, microwaves and conventional ovens, stoves and ovens and small cooking appliances such as slow cookers, etc. Table 1 lists various examples of home appliances and some of the voice features that are applicable to each.
Table 1. Home Appliances and Voice Solutions
When considering adding voice output to white goods and home appliances, a number of solutions currently exist on the market that provide excellent options for the home appliance designer. Playback-Only solutions are suited for alerts, prompts, warnings, etc. where the message does not need to be changed/updated. This is a very economical solution for very high volume, mass-produced items. Record/Playback is well suited to memos, custom messages, alerts, prompts warnings, etc., where the message may need to be changed/updated. Text-to-Speech is appropriate for reading text messages out loud, where the vocabulary may vary or be updated frequently.
Once the designer has selected a voice solution for inclusion into a home appliance design, various technology options should be considered. Masked ROM devices have their encoding permanently etched into solid-state memory, much like a CD is pressed with a permanent set of encoding. Masked ROM devices typically incorporate the microcontroller/voice processor, the speech ROM, and the output drivers into a single-chip device. These devices, such as Winbond's PowerSpeech family, employ either 5-bit MDPCM or 4-bit ADPCM algorithms. These voice compression schemes help reduce memory storage requirements, thereby reducing chip size and cost. The sound file is decompressed at playback, and driven to the speaker via an internal pulse-width modulator.
Figure 1. Overlapping Voice Technologies
Overlapping areas of interest/applications, Figure 1, for record/playback include Playback-Only applications where the run rates may be small (e.g., customized voice prompts, or language-specific prompts) or the quality of the voice prompt is a primary design requirement. Playback only devices are also useful for TTS prompt/alert applications where the vocabulary is limited to a handful of phrases that do not change, and high product volumes justify the Masked ROM investment. Finally, Record/Playback IC technology is effective in TTS prompt/alert applications where the vocabulary is limited to a small number of phrases, but where those phrases need to be either altered or customized in some manner. This presents an economical solution for smaller volume mass production.
From the designer's perspective, another benefit of using TTS is the design reuse and re-programmability. For new models of the appliance, there is no need to record the prompts again, or add patches to previously recorded prompts. Modifying the text will give the new look to the new model providing an easy, fast and new spin of a product in very short period of time.
Figure 2 depicts the TTS circuit diagram of the WTS701 control signals and the connection to a microcontroller. This figure shows the simplicity of connections required from host controller to the WTS701, which are the four basic pins of SPI (MISO, MOSI, SS, SCLK) and two optional handshake pins (INT and R/B). Other connections are to Crystal oscillator, Reset and analog output signals such as AUX_OUT, SP+/-.
Figure 2. Circuit diagram of WTS701 Control Signals.
DSP+Flash presents another technology option. With this solution, the sound file is stored in non-volatile digital memory, such as Flash, which in turn enables record/playback operation. This is typically a multiple-chip solution, where the audio processor (DSP) is separate from the memory (Flash) and the speaker driver (Class D amp). The voice processing algorithm can therefore be tailored in software to the specific requirements for the application. (e.g., on the upper-end of the spectrum, MP3 players offer high-fidelity audio suitable for music.) If the application already has a powerful DSP on board - with processing cycles to spare for the record/playback operation when needed - then this may be an economical way to add voice capabilities. The trade-off involves the Do It Yourself development effort and costs.
The following sections focus on several technologies offered by companies like Winbond and how they might be incorporated into white goods or home appliances. For the purposes of this article, the discussion will focus on voice record and playback and text-to-speech IC design options.
Microwave Ovens: Voice Record and Playback Overview
Winbond's ChipCorder voice record/playback ICs incorporate a patented multilevel storage (MLS) technology that stores analog signals directly into the non-volatile memory array. The most apparent advantage of this technique is that sound quality is preserved while enjoying an 8:1 compression ratio. Lossy compression algorithms are eliminated with this all-analogue approach.
ChipCorders are self-contained record/playback ICs that incorporate microphone preamplifier and speaker drivers. An external microcontroller is needed to control message management (e.g., to tell the device which message to playback).
Despite the technological advances and simple one-button circuitry, most people still do not use a microwave oven to even a fraction of its full capability. Instead, microwaves are most frequently used for reheating leftovers or for making popcorn.
Most microwaves feature a small one-line LED display, with inherent limitations for communicating instructions to the user. The option of providing detailed instructions via text prompts would be of limited value. A more realistic solution would be to provide voice prompts (e.g., "How many baked potatoes do you want to cook?") and give consumers helpful safety reminders such as "Don't forget to cover food properly prior to cooking to prevent splattering." Incorporating voice prompts/alerts/warnings into microwaves would go a long way in adding additional value to this type of home appliance.
Since the types of prompts are limited to a finite set of phrases, the typical microwave designer probably doesn't require a solution as elaborate as a Text-To-Speech subsystem. Similarly, each microwave oven model may have slightly different sets of prompts. In addition, OEMs could further tailor their products for specific language markets, each of which representing production volumes too low to justify PowerSpeech Masked ROM devices. For this business model, ChipCorder voice record/playback devices are an excellent fit.
As a self-contained subsystem, ChipCorders provide an easy option for adding voice prompts to the existing system. Essentially, designers can handle the voice subsystem as a black box. Figure 3 depicts a block system diagram of the ChipCorder solution as it can be applied to a microwave home appliance.
Figure 3. ChipCorder System Block Diagram with In-System Programming Capability.
In this diagram, the microphone inputs are not being used. Instead, the AUX IN ports are used for on-board programming of the voice prompts. This allows the designer to use a generic voice subsystem module across a variety of microwave oven platforms, thereby amortizing design efforts. Voice prompts should be recorded at the last possible moment, allowing the manufacturer to tailor the recordings to the specific country/market (Dutch prompts for the Netherlands market, for example). This can be accomplished towards the end of the manufacturing process when the AUX IN is hooked up to the nearly finished microwave oven and, having temporarily assumed control of the subsystem's operation, downloads the appropriate voice prompts into the ChipCorder keypad. The system contains a ChipCorder device; microcontroller, LCD, microphone, speaker and several control buttons. In-system programming can be achieved through the connection to the microcontroller and an analog input of ChipCorder, such as ANA IN, AUX IN or microphone. However, this option must be taken into consideration during the initial design phase for both hardware and software development. The designer needs to build in a special interface in order to communicate with both microcontroller and ChipCorder. Such an interface can connect to the programming system via an interface cable. This will provide the flexibility of updating the voice prompts as required.
Many microwave ovens typically operate at 2450MHZ so the ChipCorder will require special RF shielding to prevent interference from the magnetron. Failure to do so may result in unacceptable sound quality. A metal enclosure for the device and extra grounding planes on the PCB will accomplish the task.
In general, industrial appliances tend to be noisy; emitting a lot of noise over the power lines. Special consideration by the designer must be given to ensure that the power supply to the ChipCorder is clean. This can be challenging when using low-voltage devices in an application where the rest of the system is running at high voltage. A good voltage regulator and large bypass caps placed strategically near the ChipCorder will eliminate this problem.
Although microwaves run cool, designers should opt for a device that meets Industrial Temperature (-40 to +85C) specifications just to be on the safe side. With all of the shielding in the system, airflow may be restricted more than anticipated, so the extra temperature tolerance provides additional safety.
One of the principle elements for achieving good sound quality is the speaker itself. Due to the size of the appliance, designers can select a speaker which delivers quality sound and can fit the appliance's space and cost constraints. A low impedance option will help achieve louder sound volume, but an equally important consideration is the speaker housing. A well-designed enclosure will help enhance the speaker's sound quality dramatically. A resonant cavity allows a speaker to work more efficiently by exploiting the fore and aft movement of air by the cone. This provides a cost free method for extracting the best possible sound quality out of the speaker.
With respect to programming, each voice clip is stored in the ChipCorder's memory array, and will have a unique starting address and file length. A memory map, or Lookup Table (Table 2), keeps track of where each sound file is located. Playing the voice prompts is a simple matter of issuing a Play@Addr command.
Table 2. Look-up Table
A system designed to handle multiple languages is probably easiest served by being very liberal with the padding between voice clips. For each of these phrases, determine which one is the longest, and use that duration to determine the duration and starting address for each clip. That way, regardless of which language is being spoken, as far as the computer is concerned, it is still "Play at address XXXX." And while this approach may result in some wasted memory space between clips, but it will certainly simplify the software. For each language version, simply update the voice file, not the software.
Aside from the speaker itself, the quality of the initial sound files of the prompts is the primary determinant of the quality of the resulting product. Clean recordings free of background noise, optimal recording levels, signal processing (equalization), etc. are most desirable. Hence, with some attention to design detail and a modest investment of equipment, even a small company, with minimum engineering resources, can prepare high quality voice prompts.
A TTS Application: VCR/DVD Player/Recorder
Text-to-Speech (TTS), the next step in speech record and playback solutions, offers designers the flexibility of changing numerous voice prompts easily. By utilizing TTS, the appliance cannot only perform the basic functions, but voice talk-back capabilities can be added. When speech prompts are not limited to a finite number of phrases, and contain a large vocabulary of phrases, then the user interaction with the appliance is enhanced. TTS can and should be added to machines that have lots of states and operations to perform, thus having a large volume of information to report to the user.
The use of text-to-speech (TTS) in some applications is necessary when the exact text is unknown, received via network - such as Internet Radio or Set-Top Box. In addition, TTS is an ideal choice when the text used in the system is extensive, and needs to be spoken thereby making it nearly impossible to store as voice. For example, storing a home appliance user's manual in the device and guiding the user through the detailed programming instructions. While technology advancements continue to be made in many consumer products, most home appliances still come with long and cumbersome user manuals that take a long time to read. This is most inconvenient to the busy consumer when any features need to be changed, the consumer must either locate the old manual or contact the manufacturer for assistance. Adding speech to these appliances will make usage and on-going programming efforts far more convenient to consumers.
With embedded text-to-speech technology, words are constructed from pieces of speech, through a concatenation method, with some post-processing done on the speech to adjust the word to the sentence. Embedded environments, however, have limits in the use of powerful processors and available memory. By contrast, with recorded speech approaches, the word - or even the whole sentence - is recorded in a natural voice with perfect intonation and stress. In such recordings, words correlate 100 percent to the other words in the sentence. Therefore, embedded systems cannot produce speech quality that will be close to server/PC-based solutions. From this perspective, recorded speech will always sound better, although processing and memory requirements make it unsuitable for many low-cost systems.
This being said, other system design goals will continue to drive the use of TTS technology for efficient information delivery. For example, a user programming a VCR for the first time needs relevant information conveniently delivered, intelligibly. The consumer, in all likelihood, will care very little for the quality of the speech that imparts that information, and concatenation-based TTS solutions provide the most cost-effective and flexible way to handle spoken delivery of large files of information in microcontroller-based, embedded products.
Winbond's text-to-speech technology (WTS701) resides on top of the underlying record/ playback or playback-only platform. A host controller takes the input ASCII text file, that is stored in the system's memory or streaming text received via network, and "reads it out loud". This is accomplished by normalizing the text, converting orthography to transcription and concatenating a string of words, or parts of words, that are stored in memory. Winbond's solution uses MLS technology that allows the storing of a human voice to create more natural speech, and a voice that can be recognized as human. The resulting sound is markedly different than other DSP based solutions, which tend to be far more computerized and robotic.
TTS Design Considerations
The number one consumer complaint regarding VCRs and DVD player/recorders is the programming. Programming can be done initially as setup or whenever the consumer needs to perform a new task such as programming the VCR for future recording. Adding TTS to the system will enable the product to have the heretofore otherwise unwieldy manual simply spoken to the user. By adding TTS functionality, the user could plug in the power cord and the machine would automatically greet and guide the user through the setup. This improvement will make setup, daily operation, feature tuning and troubleshooting much more comfortable and easier for the user. Not to mention those users that utilize only small fraction of the capabilities of consumer electronics.
Furthermore, when the device guides the user through each step of set-up, then the machines could also "sense" what the user actually does, making the experience interactive. For example, of the audible instruction "press the button on the remote to determine the Line input", can be measured by the machine and determine whether the user has actually done so successfully. Sometimes, the machine will need to provide some extra help for people who cannot find that button, for example by just pressing the button on the remote the machine will say "the button is a red one and is located on the top left corner of the remote". With TTS this is possible.
TTS is available for many uses in home appliances, not just the programming mode. The technology can also be used to announce status, alerting the user before performing an action or accepting an instruction. For example, before recording, the machine can warn the user by announcing: "VCR will now begin recording the selected program on channel 52 for one hour".
Most VCR/DVD player/recorders feature a socket for the cassette/CD, a few keys for the user to operate, several connectors to wires, LCD with limited characters and IR receiver for communication from the remote control. The LCD can display a limited amount of characters hence it is not useful for showing detailed instructions.
As seen in Figure 4, the speaker can be located either on the VCR body itself or alternatively on the remote. The later allows transmission of the message from the VCR to the remote for annunciation.
Figure 4. VCR concept for adding voice output.
Generally speaking, home appliances do not have a great deal of additional space for chipsets so footprint is a consideration for most designers. Additionally, most home appliances also do not possess the extra horsepower to run heavy-duty text-to-speech algorithms. This means that text-to-speech systems that run on servers or desktops or even on powerful embedded platforms all constitute over-kill for this kind of application. The most suitable embedded text-to-speech solution for this type of home appliance would feature a simple serial interface to the VCR host controller, and would not require additional components. It would also be able to drive the hardware speaker directly. The WTS701 is a single-chip solution that does not require any additional components and is able to drive a speaker directly, as it includes internal amplification circuitry.
Adding the WTS701 to the VCR/DVD can also provide additional customization of the VCR. The user can program its name to the VCR and provide verbal preferences. For example, setting a reminder to watch a specific program at a specific day and time. When the time arrives, the VCR can call the user's name and announce, "Joe, your show is about to begin". The WTS701 provides the feature of customizing names instead of user codes and numbers, remembering these names in internal NV memory.
Figure 5 illustrates a component level block diagram for a VCR with TTS capabilities. The host controller drives all peripherals such as memory, keys, LCD, communication to/from remote controller, driving motor, other analog functions and of course the WTS701. The host controller drives the WTS701 via SPI interface, which requires only four pins, an important design consideration for this I/O intensive application. The WTS701 receives streaming text and commands via this interface, converts it to speech internally, without a need to use memory or any other component, and drives the speaker directly through the SP+ and SP- pins.
The flash memory device holds the user's manual and help instructions to the user in text, minimizing the amount of memory needed. The host controller software needs to support the interface to the user in order to supply instructions according to the user's progress and upon demand.
The VCR/DVD generates some noise via the internal moving parts. However, greater noise is generated from the audio/video equipment that it is connected to, such as home entertainment system and TV. This should be taken into consideration when designing the system so that interaction with the user is accomplished only when the user requests information and not, for instance, while watching a movie. Locating the speaker on the remote control can eliminate most of the problem. Good housing for the speaker is a key to providing quality sound, however since the device already connects to audio devices it is possible to utilize this connection driving the speech via the WTS701's AUX_OUT pin to the TV or other sound system. This option will eliminate the need for using a speaker at all in this application. Using different layers for the analog and digital signals will help in preventing and creating noise. Temperature is not an issue in this application; however, supporting the industry standard industrial temperature range is recommended.
Figure 5. Generic Block diagram (components level) VCR with TTS capabilities.
Voice prompts provide a fast, cost effective method for OEMs to differentiate home appliances while making these products more intuitively interactive. A number of technology options have been discussed in this article and a range of options for adding voice to appliances, based on volume, technology, number of prompts and quality of voice desired, covered. Design considerations, based on Playback-Only and Record/Playback solutions for alerts, prompts, warnings, and memos, were covered using a microwave oven as the example. Winbond's ChipCorder ICs offer a very economical solution for high volume, mass-produced applications in this arena. In addition, a text-to-speech solution, appropriate for reading text messages out loud, where the vocabulary may vary or be updated frequently, was discussed using a VCR/DVD recorder/player as the design example. Winbond offers both these solutions and the examples presented herein only begin to touch on the ways that voice can impact a new home appliance product design. Ultimately, design options for adding voice are only limited by the engineers' imagination.