A strange thing happened on the way to the future. The cellular telephone handset was transformed into a multi-way, multi-media information terminal. Along the journey, cellular handset audio requirements have changed dramatically from the basic requirement of two way voice communication. Now the array of audio uses in a cellular handset is staggering.
FUTURE AUDIO ARCHITECTURE IN CELLULAR HANDSETS
The signal paths used by the cell phone processors can be either analog or digital or both, depending upon whether or not codecs and small signal amplifiers for microphone and headphones are included on the processor silicon. Semiconductor processes for making processors trends towards smaller geometries for higher device density, and lower operating voltage. Any analog blocks on these smaller geometry processors will suffer performance penalties, such as lower output power (voltage AND current) and lower SNR. While analog structures can be built on these smaller geometry processes to deliver somewhat higher output power, those structures become an increasingly larger percent of the die area, and will result in a larger, less cost effective, IC than is really necessary for just the digital processing functions.
Therefore, the data conversion and amplifier functions are being dis-integrated from the processors and are being replaced with one or more digital interfaces to external data conversion and amplifier functions, either as separate devices or contained in an audio subsystem.
Possible Digital Interfaces for Audio
There is a wide variety of well known digital audio interfaces, each with it own unque advantages and dis-advantages for use in a cellular handset:
I2S (Inter-IC Sound) Interface
I2S is a 3-wire, 2 channel, buss specifically for digital audio. It allows a broadcast type data transfer protocol and is point-to-point, i.e. from transmitter to receiver with no feedback from the receiver as to whether or not the data was properly received.
I2S is in widespread use in consumer audio equipment and for digital audio playback (MP3) in cell phones.
The original specification of I2S requires that Serial data is transmitted in 2’s complement form with the MSB first. The transmitter always sends the MSB of the next word one clock period after the Word Select (WS) changes. The MSB is transmitted first because the transmitter and receiver may have different word lengths. Also, the MSB of each channel has a fixed position, whereas the position of the LSB depends on the data word length.
The transmitter does not need to know how many bits the receiver can handle. If the system word length is greater than the transmitter word length, the word least significant data bits are truncated to fit. If the system word is less than the transmitter word length, the remaining least significant data bits in the word are set to “0”.
If the receiver is sent more bits than its word length, the bits in a channel after the WS transition are ignored, and the next channel data is read, MSB first. On the other hand, if the receiver is sent fewer bits in a channel than its word length, the missing bits are set to zero internally.
This flexibility of I2S has led to the proliferation of industry de-facto I2S data formats supporting different categories of digital audio that don’t meet the original specification. Some common variations are 16, 18, 20, or 24 bit data, and either Right Justified (Sony) or Left Justified (Philips), and either MSB first or LSB first.
This raises the question: What is I2S “compatibility” between devices? Which I2S format is the basis for the compatibility?
The I2S input D/A’s generally in use today are able to recognize and adjust to whatever data format comes in. However, some D/A manufacturers don’t support all possible variations of I2S input formats for many reasons.
Conversely, I2S output A/D’s should not be burdened with generating data in all possible formats. In a cell phone there is no need for a voice band A/D to output all possible data sizes, and formats that I2S can support. For example, why output 24 bits at 48 KHz sample rate?
Another question that arises is: Do you need to support all possible I2S data formats in a cell phone?
PROSIndustry Standard supported by COMMS and APPS processors
Simple to implement
Multiple format variations supported
CONSNo control data structure
Uni-directional, 2 channel (L, R) only
Multiple data formats can be confusing – what about compatibility between devices?
PCM (Pulse Code Modulation) Interface
The PCM (Pulse Code Modulation) format represents the binary digits (1’s and 0’s) of sample values directly to the presence or absence of a pulse. PCM is the most popular data format for storing and transmitting uncompressed digital audio.
High quality digital audio requires high sampling rates and number of bits per sample (word length). PCM with uncompressed linear quantization is used for digital audio, with a sampling rate of 48 KHz currently recommended by the Audio Engineering Society (AES). But the pervasive music CD uses 16-bit PCM data recorded at 44.1 KHz sample rate. PCM is also used by digital audio tapes (DAT’s) and is a common format for AIFF and WAV digital audio files. A sample rate of 96 KHz is recommended when higher bandwidth is available and is considered a professional audio standard.
In a cell phone the primary use of PCM is for a bi-directional digital audio (voice) signal path using 4 wires.
DPCM (Differential Pulse Code Modulation)
DPCM is a lossy compression format that stores only the difference between consecutive samples. DCPM uses 4 bits to store the difference, regardless of the resolution of the original file. With DCPM, an 8-bit file would be compressed 2=1, and a 16-bit file would be compressed 4=1.
ADPCM (Adaptive Differential Pulse Code Modulation)
ADPCM is similar to DCPM except that the number of bits used to store the difference between samples is varied depending on the complexity of the signal. ADPCM works by analyzing a succession of samples and predicting the value of the next sample. It then stores the difference between the calculated value and the actual value.
Sometimes game software uses the SMAF digital audio format which is basically a PCM/ADPCM file.
A-law and mu-law Compression
Telephony applications use non-linear quantized PCM for more efficient use of low bandwidth for speech. A-law and mu-law are two very similar lossy PCM compression schemes. Both methods use logarithmic instead of linear quantization levels to represent low-amplitude samples with greater accuracy than higher-amplitude samples. Speech is typically sampled at an 8 KHz rate. If linear quantization is used then about 12 to13 bits per sample are needed for a minimum SNR of 72dB, giving a bit rate of about 96 Kbps. Using logarithmic quantization, this could be reduced to 8 bits per sample with a corresponding data rate of 64 Kbps. Or you could take a high quality 16 bit linear quantized voice sample and drop the least significant 3 bits. By applying a logarithmic coding table, the remaining 13 bits could be compressed to 8 bits without any significant loss of speech quality or SNR.
For voice recognition and commanding, voice band codecs use linear quantized 16 bit resolution data with from 8 KHz to 26 KHz sample rates which extends the voice audio bandwidth to 11.7 KHz.
A-law and mu-law are both ITU (International Telecommunication Union) standards and are widely used. In America and Japan, mu-law coding is the standard, while in Europe and the rest of the world A-law compression is used.
Industry Standard supported by COMMS and APPS processors
The typical format of the PCM samples can be 8-bit A-law, 8-bit mu-law, 13-bit linear or 16-bit linear
The PCM_CLK and PCM_SYNC terminals can be configured as inputs or outputs, depending on whether the module is the Master or Slave of the PCM interface.
In a cell phone, generally limited to voice band communications
No control data path
AC’97 (Audio Codec ’97) Interface
AC’97 was developed for PC market. The controller function is incorporated within the baseband or applications processor. Digital audio and control commands and status information is transported over a 5 wire (including RESET), 13 time slots, TDMA bi-directional buss called AC-Link. Three time slots are for TAG bits, Command, and Status information. Another time slot is reserved for GPIO bits. Therefore, each AC-Link will support 9 digital audio channels.
The digital audio data format is typically linear quantized PCM of 16, 18, or 20 bits in length and the sample rates supported are 8.0, 11.025, 16.0, 22.05, 32.0, 44.1, and 48 KHz.
Each AC-link can support up to 4 AC’97 compatible codecs by uniquely setting each codec 2 bit ID code and using 4 serial in data lines on the controller. The typical PC configuration includes one Modem Codec for telecommunications data transfers. Either the controller or the Primary Codec may be the Master.
Industry Standard supportable by some COMMS and APPS processors
12 bi-directional channels
Control and Status channel
Not prevalent in cellular handsets, but used heavily in PC market
Fixed frequency operation
No good multiple device support - Max 4 AC’97 codecs with one AC’97 Link) & compatible controller
Bi-directional I2S (4 wire) Interface
To accommodate non-voice path digital audio playback and record, the current trend is to create a bi-directional version of I2S.
4 channels – 2 channel (L, R) each direction
NOT Industry standard
NOT generally supported by COMMS and APPS processors
No control or status data path in either direction
Only 2 channel (L, R) in each direction
High Definition Audio (Azalia) Interface
High Definition Audio (HDA) -“Azalia”- is a specification primarily designed to support multi-channel digital audio on a PC, and is not expected to be used in cellular handsets. However, it is listed here because of certain similarities with AC’97 which is sometimes used.
Azalia specification for ue in PCs.
The High Definition Audio controller is a bus mastering I/O peripheral, which is attached to system memory via the PCI buss. It contains one or more DMA engines, each of which can be set up to transfer a single audio “stream” to memory from a HDA device or from memory to a HDA device depending on the DMA type.
Digital audio data and control information is transferred between the controller and HDA devices, primarily audio and modem codecs. The link distributes the audio sample rate time base clock, and the protocol supports a variety of audio sample rates and sizes under a fixed data transfer rate.
The similarity with AC’97 is in the way multiple codecs are supported, i.e. by the use of multiple Serial Data In (SDI) lines primarilyDMA controlled streams, up to 16 channels per stream
Support for 15 input and 15 output streams at a time
Sample rate support ranging from 6 kHz to 192 kHz
Support for 8-, 16-, 20-, 24-, and 32-bit sample resolution per stream
Support for striping on optional higher order SDO link pins to double or quadruple available outbound BW
Support for multi-SDI codecs to increase available inbound link BW
Codec architecture fully discoverable, allowing design flexibility
Extensive, fine grained power management control in the codec
Not supported by next generation processors for multi-media PC market
Output streams are broadcast and may be bound to more than one codec
Input streams may be bound to only a single codec
All channels within a stream must have the same sample rate and same sample size
48-kHz fixed frame rate
Not easily adapted to cell phone architectures
Each active stream must be connected through a DMA engine in the controller. If a DMA engine is not available, a stream must remain inactive until one becomes available
SSI (Serial Synchronous Interface) Interface
SSI is a proprietary 4 wire interface supporting up to 4 time-division multiplexed (TDMA) channels for communication between audio codecs, modems, and baseband processors. When used in this manner, the SSI buss is in its Internal Network Mode. The Network Mode is where a master SSI device is connected to more than one slave SSI device. The SSI buss has some other modes which allow specific configurations of codec, Bluetooth modem, and baseband processor to be used. In these other modes, SSI can communicate with one AC’97 codec (2 channels) for fixed and variable sample rate digital audio, or a standard I2S device (2 channels) with either 44.1 KHz or 48.0 KHz sample rate digital audio.
4 channel TDMA, supporting fixed and variable rate transfers
Flexible audio, voice, and data routing without host processor intervention
Separate and simultaneous audio paths from hosts to peripherals
4 channels only TDMA
8-bit Parallel Interface
Hardware music synthesizers typically use an 8 bit parallel data path, plus Chip Select, Read, Write, Address, and IRQ pins, like the older ISA Buss. Newer products also offer I2S as a selectable alternative digital audio input and output channel, with I2C as a control channel. Some products have a separate I2S output channel as well.
Similar to ISA – familiar to designers
Can send digital audio or control data depending upon Address bits
Series of 8 bit bytes – channels determined by addressing scheme
Can be bi-directional, but generally used in one direction
Typically ONLY used by hardware synthesizers for MIDI commands and SMAF digital audio
>12 pins – In-efficient use of pin count
Not standard for modern digital audio transfer protocols
Each of these interfaces has different signal characteristics and therefore must be handled differently in the cell phone handset.
Existing control interfaces for audio in cellular handsets
For devices using digital audio interfaces, a control interface buss is also required. There are several choices.
I2C (Inter-Integrated Circuit) Buss
I2C is the most commonly used control buss in the cellular handset. I2C is a multi-master, multi-slave 2-wire bi-directional serial interface. The active wires, Serial Data and Serial Clock (SDA and SCL) are bidirectional. The maximum number of devices connected to the I2C buss is dictated by the maximum allowable capacitance on the lines (400 pF) and the 127 device addresses available. There are 2 forms of I2C, “standard” (up to 100Kbps) and “fast” (up to 400Kbps).
As originally specified, I2C signal characteristics are in relation to a supply voltage of 5Vcc and the high and low logic thresholds are a function of Vcc instead of being a fixed value.
With small geometry processes for CPU’s, the supply voltage and logic levels are dropping. Versions of I2C are being used to work on voltages lower than 5Vcc, particularly 3.3Vcc, with logic thresholds fixed to specific values to enable working with digital control signals from processors with supply voltages as low as 1.8Vcc.
SPI (Serial Peripheral Interface)
SPI is used in handheld and other mobile platform systems, although much less frequently than I2C. SPI is basically a 4-wire synchronous serial data protocol providing support for low to medium bandwidth (1 Mbps) network connection between CPU’s and other devices supporting SPI.
SPI is a single master and multiple slave buss. SPI data can be simultaneously transmitted and received in full duplex mode in blocks of 8 bits. Each SPI slave device requires a Chip Select signal. If 10 devices are on the bus, 10 chip-select lines, in addition to the shared clock and data lines are needed to select the appropriate device. The Chip Select signals are normally driven from a GPIO port on the processor. SPI has four modes dependent upon the Clock Phase (CPHA) and Polarity (CPOL). If the phase of the clock is zero, i.e. CPHA = 0, data is latched at the rising edge of the clock with CPOL = 0, and at the falling edge of the clock with CPOL = 1. If CPHA = 1, the polarities are reversed. CPOL = 0 means falling edge, CPOL = 1 rising edge.
SPI has four modes, the most commonly used of which is mode 0. Microwire is SPI mode 0.
To make matters worse, the digital data formats for audio vary widely
Voice communication is typically by a bi-directional Pulse Code Modulation (PCM buss) 64Kbps data channel using A-law and mu-law logarithmic coding schemes to get more dynamics from the 8 KHz voice samples than is available with linear coding. Another telephone voice standard called Adaptive Differential PCM (ADPCM) codes voice into 4-bit values and uses a 32Kbps data channel.
Some hardware music players/decoders have D/A’s providing analog audio output. Other hardware player/decoders use the uni-directional 2-channel I2S buss which has many data format variations for digital audio output. The output audio sample rate depends upon the (1) the sample rate of the original digital audio file, or (2) the input sample rate of the original digital audio file PLUS the output sample rate of the decompression algorithm used in the decoder to play it back.
Software decoder’s output digital data directly on the I2S buss.
Customers are asking for support of EVERY possible audio sample rate. Therefore, the digital music D/A’s must be able to determine and use whatever I2S format they receive.