Page 32 - Acoustics Today Summer 2011
P. 32
Fig. 4. Hindi word [bhana] shown using reassigned spectrogram (upper panel) pruned to remove noise and irrelevant points. Middle panel shows a reassigned spectrogram of a brief portion of the first vowel [a], pruned to show frequency components only. Lower panel shows the corresponding conventional spectrogram with the same analysis windows (9 ms).
found in telephone interfaces is human-to-computer voice dialog, where people can access information and effect trans- actions verbally without needing a human operator. This requires both automatic speech recognition (ASR), to con- vert one’s voice into a textual message without manual assis- tance, and text-to-speech (TTS), to formulate the verbal responses. Both of these tasks are founded on algorithms that involve digital signal processing. We will here discuss chiefly the recognition aspect, but speech coding has often used many of the same processing techniques as speech recogni-
tion “front ends.”
Speech coders, as found in modern cell phone technolo-
1,7
28 Acoustics Today, July 2011
LP esti- mates each sample of a speech signal based on a linear com- bination of a small number (e.g., 10) of its immediately pre- ceding samples. The speech signal is thus modeled statisti- cally as an autoregressive process. Such a model also deter- mines a digital filter representing the vocal process, from standard tenets of filter theory. The multiplier weights in the resulting filter allow simple synthesis of speech very effi-
gy, often use a linear prediction (LP) approach.