Design Article
A Prototype Lab Box with DSK'C6711/13 for Rapid DSP Algorithm Development
Graziano Bertini and Massimo Magrini
10/19/2005 12:00 AM EDT
![]() |
Texas Instruments Audio and Video/Imaging Series |
Rapid prototyping is a new approach in Digital Signal Processing systems development. With the advent of Matlab's Real Time Workshop (RTW) it is now possible to compile, load, and execute graphically designed Mathworks Simulink models on an actual DSP platform, without spending many workdays coding in typical DSP-oriented languages, or C/C++ dialects (compilers).
RTW supports the powerful Texas Instruments 'C6000 series, including the TMS320C6711 DSK we are employing in our lab. Here we describe our experience in using these tools to develop and test real-time voice transformation algorithms proposed by the Musical and Architectural Acoustic Lab of FSSG-CNR (Fondazione Scuola San Giorgio-Consiglio Nazionale delle Ricerche, Venice, Italy).
However the first generations of starter kits were very cheap, including only the DSP processor and a few hardware devices. The system design for a specific application required the realization of a PCB including necessary hardware components around the DSP chip. The lab prototyping was expensive, and the final results were not achieved in a "time to market" manner.
Figure 1 shows an example of the old starter kit TMS320C26 DSK integrated into a system using infrared devices, in order to control sounds and music production by hand movements as realized in our laboratory.
Figure 1: TMS320C26 Starter Kit (in the center of prototype) integrated in a system for gestural control of computer music via MIDI (Musical Instruments Digital Interface) protocol |
Subsequently very suitable DSKs with audio codecs, static and/or dynamic memories, and faster PC interfaces became available. Moreover, improvements were made to software tools and more complete libraries were included. More recently, one could compile and execute DSP algorithms from graphic models and schemas. For example, Simulink Real Time Workshop is able to work directly with TI DSP processors.
During the TI Europe One-day Workshop organized in Florence in April 2003 we discovered the features of the TMS320C6711 DSK and we also obtained, as a grant, a kit for starting the experimentation. The following article details our experience with the implementation of complex voice transformation algorithms using the state-of-the-art HW/SW tools for rapid protoytping, in other words, the evaluation board with powerful floating-point DSP processors and the support of a high-level SW compiler (Simulink with Real Time Workshop).
Our collaboration only focused on the development of a DSP system for supporting the speech restoration functions. More precisely the main goal of our work, here described, was to verify the possibility of performing real-time voice transformation using a DSP embedded system for implementing the proposed algorithms.
Besides the implementation of a set of methods, such as LPC/VC (Linear Prediction Coding / Voice Conversion), the FSSG-CNR team proposed innovative methods for improving the quality of synthesized voice. The entire set of operations represents a particular implementation of the so-called "Virtual Dubbing" procedure. The basic steps of the complete project included: designing a method for high-quality voice transformation, implementing a suitable algorithm in Matlab and Simulink and, finally, translating it into DSP target code by means of a rapid prototyping approach. The original code was developed in Matlab and so we used the Mathworks MATLAB's Real Time Workshop (RTW) DSP platform for rapid prototyping.
Real Time Implementation Based on DSP
Since the complete implementation of the method has a great computational complexity, it is necessary to test the possibility of implementing it in real-time.
As we developed the basic algorithm of the method at a research level, we needed to perform tests on a flexible platform that allows the implementation of the non-optimized algorithms with a reasonable effort. For this reason, we decided to implement them on a DSP hardware platformTexas Instruments floating point, 32-bit processor (the TMS320C6711).
This DSP processor is based on the VLIW (Very Large Instruction Word) technology, which allows fast parallel computing jointly using its optimized "C" compiler. For a rapid evaluation of the TMS320C6711 processor a Developer Starter Kit (DSK) is available from Texas Instruments, comprising a board and the software tools. The board must be connected to a standard PC running under its development environment CCS (Code Composer Studio).
This board (Figure 2) includes 16 MBytes of SDRAM, plus 128 KB of external Flash. Along with other features, the most relevant for us, is the presence of an audio codec (AD/DA converter), the TLC320AD535. Although its quality is rather poor (16 bit, 8 kHz sampling rate), it is sufficient for testing DSP applications in the speech audio band. In addition, the board can host additional daughter boards with better performance A/D converters: for example the PCM3003 daughter board, which allows 48 kHz sampling rate (16 bit stereo). In a second development phase we also tried to use the DSK board based on the slightly faster TMS320C6713. This second board has a USB link with the PC, which is much faster and reliable than the parallel connection.
Mathworks' Real-Time Workshop builds applications from Simulink diagrams for prototyping, testing, and deploying real-time systems on a variety of target computing platforms, including Texas Instruments C6000 class DSP processors (Embedded Target for TI C6000 DSP). This library offers a set of Simulink blocks, implementing the behavior of various devices on the board (Figure 3).
Figure 3: Line In and Line Out Simulink blocks |
Once we simulate the Simulink model it is possible to create some settings in model configurations in order to instruct Matlab to start the DSP code building process. The resulting translation will transform the Simulink model into a Code Composer Studio C language project and then, controlling the DSP compiler, into a DSP binary code, which can finally be downloaded on the DSK board. Before the translation, we tuned the model using Mathwork's Model Advisor, a software tool that comprehensively analyzes the Simulink model to help the programmer to appropriately configure Simulink and Real-Time Workshop.
Hardware Platform
In order to satisfy a specific request of the RACINE-S project team and for easier use of the system, we built a custom chassis for the starter kit board (Figure 4). This embedding hardware is equipped with additional analog circuitry that provides analog input/output interfacing, a power supply, and signal conditioning (with a low noise preamplifier). The front panel contains lamps for system state monitoring and a reset button (corresponding to those on the DSK board). The chassis also shields the system from electromagnetic interferences, improving the system's reliability.
Figure 4: View of the system based on the DSK board, connected to the PC |
In a traditional dubbing studio a professional dubbing speaker dubs (Figure 5) the sentences pronounced by the target voice, if available, or simply reads the script, if the corresponding sequence is missing. The main idea at this point is to provide the voice of the professional dubbing speaker with the features of the target voice. This voice conversion process can be performed in real time during the dubbing process or off-line at an audio post-production level.
Figure 5: Virtual dubbing block diagram |
Algorithm Description
As previously stated the project requires the conversion of a source speaker's voice into another, as if it were pronounced by a different (target) speaker. Today's voice conversion algorithms mainly rely on residual-excited LPC synthesis [1,2] or on STFT-based synthesis [3,4].
In both cases the core of the algorithm is the definition of a map F(x) that transforms a feature vector x onto a new vector y. In the former case, the feature vector is typically a set of LPC coefficients or, due to its better interpolation properties, a set of line spectral pairs coefficients (LSP) [1]. An example of such a voice conversion algorithm is available in the FESTIVAL TTS system within the OGIresLPC synthesis plug-in. In the latter case the feature vector can be some sort of spectrum magnitude representation, such as the FFT magnitude or the mel-cepstral coefficients [3].
A simple definition of the map F can rely on some parametric non-linear function approximation tools, for example, neural networks or Radial Basis Function Networks (RBFN) [4]. Another common approach to the definition of the map structure is to use a probabilistic, locally linear function. A widely used technique makes use of Gaussian Mixture Models (GMM), which are capable of embedding an acoustic model of the source speaker in the mapping function based on a Gaussian Mixture [1,2].
The design of the conversion function requires a number of basic steps. Firstly, a database of sentences uttered by different speakers is generated. Then, the feature vectors by a frame-based LPC or sinusoidal analysis are computed. Before the training step a dynamic time warping procedure (DTW) is required to time-align the feature vectors derived from the first speaker with the ones accounting for the second speaker. This guarantees that in the input-output training the first speaker's phoneme corresponds to the same phoneme of the second speaker. At this stage a conversion function based on an RBFN can be trained directly by identifying the parameters given by the input-output training pairs.
In the current project a Gaussian Mixture voice conversion model is used, in which two training sets of LPC coefficients extracted by the time window series and containing the source and the target (voice) signal respectively form the conversion function. This function is then used for mapping source into the target LPC coefficient sets.
Our LPC-VC model synthesizes the target voice by filtering the LPC residual of the source voice using target LPC coefficients obtained by converting the correspondent source LPC parameters by means of F. Careful design of the LPC-VC model requires the voice signal to be separated into "voiced" and "unvoiced" time windows. At this stage of the project, a standard segmentation of the signal is divided into small segments of equal time lengths, which are sufficient to test the LPC voice conversion.
When used to model the speech process, the components of the GMM represent different phonetic events. Let us suppose that a sequence of P-dimensional column vectors {[x\vec]t}, t=1,...,T, which represents the time-varying spectral envelope of a source signal, has been fitted by a GMM. Moreover, let us assume that a sequence of P-dimensional column vectors {[y\vec]t}, t=1,...,T, having the same length of the source signal, is the target of the conversion. We defined a spectral conversion function as a map F to transform each vector in the input sequence into the vector which occupies the same position in the output sequence, thus preserving the time information of the input and output data [9].
Implementation using Simulink / Real Time Workshop
Following the Rapid Prototyping approach we implemented our algorithms using Simulink, which provides a graphical user interface (GUI) for building models as block diagrams, using click-and-drag mouse operations (Figure 6).
The voice signal is first divided into frames having a strong overlapping factor in order to prevent the appearance of major artefacts occurring during the transition from one frame to another. For our specific purpose, here we have chosen a frame size of 1024 samples and a hop-size of 64 samples, along with a Blackman windowing that provides smooth transitions along adjacent frames. These figures seem to be widespread enough to provide sufficiently accurate results in the majority of the test cases. Then, each frame is LPC-encoded. An LPC order equal to 10 is considered suitable enough in most literature. For LPC analysis a classic extraction algorithm is employed, that performs the autocorrelation of the input signal followed by a Levinson-Durbin LPC extraction algorithm.
As previously stated, the design and the training of the VC model are performed off-line, with a number of Gaussian Mixture Models (GMMs) set to 10. The design phase first needs the target voice LPC coefficients, that are treated the same way as before. These coefficients, transformed into line spectral pairs, are then used for the GMM modelling procedure, obtaining a set of transformation parameters (M, V, W, TH0) (Figure 6) that we will use for the re-synthesis process.
The core steps of the re-synthesis are:
- lpcar2ls converts LPC coefficients into line spectral pairs. This requires a roots computation and a deconvolution.
- lpcN_fun and lpcpi_fun compute parameters used for calculating the set of line spectral pairs (lsp) encoding the transformed voice according to a specific formula.
- lpcls2ar converts the line spectral pairs back to transformed LPC coefficients.
The reported algorithms represent an important improvement over the usual LPC techniques since by means of the voice transformation algorithms, a detailed description of the spectral properties of the target and source signals is taken into account [5].
The complete model was simplified in order to develop in parallel the DSP application and to test the system (Figure 7 ). This reduced model lacks the GMM-based conversion part and requires, as input, synchronized source and target voices. For the test run we used a short pre-recorded sequence of vowels that is uploaded in the DSP hardware together with the code of the model. The output of the model is connected with the DAC line out.
On the front panel were set switches, a power supply indicator, potentiometers for setting input level, various stereo jacks for inputs and outputs, user LEDs, user switches, and a reset push button, which carry out the same tasks of the corresponding parts present on the board (Figure 9). The LEDs driver current was derived from the on board D-latch SN74LVTH16374 output, paying attention to the fan-out.
Control dynamics of the DA/AD converter were implemented on DSP using software procedures, and the overflow condition was indicated at the same time by user LEDs on the board and front panel.
Figure 9: Front panel of the enclosing hardware |
Figure 10: Simulated (top) vs. Real-time generated signal (bottom) |
The main goal of this project was to develop instruments and methods for reconstructing audio sequences starting from old, badly damaged movie films. We proposed a solution based on LPC/VC algorithms plus GMM transformation. This solution provides a successful approach to the physical modeling of the vocal tract, physical meaningfulness, and directivity of the control parameters. The quality of the resynthesized voice can be further improved using non-linear techniques [6,7]. We are investigating this possibility through the interfacing of an efficient non-linear pitch detection model with the core of LPC voice conversion algorithms. The dynamic model allows for an efficient emulation of the perceptive function with a reduced number of parameters.
These algorithms were first tested by Mathworks Simulink environment. High-quality voice reconstruction tests have been carried out with good results. According to one of the project's goals, we tested the possibility of a real-time implementation using a modern DSP platform provided by Rapid Prototyping tools. Considering the great complexity of the whole algorithm, together with some limits of the HW/SW development platform used, real-time implementation has been carried out only in a simplified form. It is necessary to optimize certain parts of the algorithm, writing them directly into S-Functions (see Matlab documentation), in order to test the complete algorithm.
We achieved an interesting result of becoming familiar with the rapid prototyping approach: the workflow we used was very powerful and is promising for further investigations and improvements.
Massimo Magrini (1966) graduated from the University of Pisa with a degree in Computer Science in 1994. Then he has worked as a "free-lance" consultant for several private companies and for academic institutes (ISTI-CNR, IFC-CNR, SNS-Pisa), mainly in the fields of digital signal processing and multimedia.




