Page 41 - Special Issue
P. 41
Gordon J. Ramsay
Address:
Spoken Communication Laboratory Marcus Autism Center 1920 Briarcliff Road NE Atlanta, Georgia 30329 USA
Email:
gordon.ramsay@emory.edu
Mechanical Speech Synthesis in Early Talking Automata
Early attempts at synthesizing speech using mechanical models of the vocal tract prefigure modern embodied theories of speech production.
Introduction
Three centuries of scientific research on speech production have seen significant progress in understanding the relationship between articulation and acoustics in the human vocal tract. Over this period, there has been a marked shift in approaches to experimentation, driven by the emergence of new technologies and the novel ideas these have stimulated. The greatest advances during the last hundred years have arisen from the use of electronic or computer simulations of vocal tract acoustics for the analysis, synthesis, and recognition of speech. Before this was possible, the focus necessarily lay in detailed observation and direct experimental manipulation of the physical mechanisms underlying speech using mechanical models of the vocal tract, which were the new technology of their time. Understanding the history of the problems encountered and solutions proposed in these largely forgotten attempts to develop speaking machines that mimic the actual physical processes governing voice production can help to highlight fundamental issues that are still outstanding in this field. Many recent embodied theories of speech production and perception actually directly recapitulate proposals that arose from early talking automata.
The Voice as a Musical Instrument
By the beginning of the seventeenth century, the anatomy of the head and neck was already well understood, as witnessed by the extraordinarily detailed illustrations found in many books of the period (e.g., Casserius, 1600). An example of a mid- sagittal cross section of the vocal tract from the first anatomy textbook published in color (Gautier d’Agoty and Duverney, 1745), correctly reproducing all of the major anatomical structures, is shown in Figure 1. However, the exact function of the many different structures within the vocal tract and the origin of the human voice were still an active topic of discussion. From the earliest definition of the science of acoustics in the landmark article by Sauveur (1700) and even before, analogies were drawn between speech and music that drove much of the debate.
The first clear understanding that the geometry of the vocal tract directly shapes the timbre of speech was published by Marin Mersenne in his book Harmonie Universelle (Mersenne, 1636). In the sixth volume of that remarkable tome, Proposition XXXVI “explains how to construct a set of organ pipes, to pronounce vowels, consonants, syllables, and utterances,” correctly inferring that appropriately manipulated tube shapes excited by a reed would produce corresponding speech sounds. Later, the focus shifted to the function of the larynx, with much heated argument about how the vocal folds were able to create sound. Dodart (1700) proposed that the glottis acts as a wind instrument, blown by air flowing over the edges of the hole between the vocal folds, whereas Ferrein (1741) claimed instead that the vocal cords vibrate like a string instrument, bowed by the air from the lungs. Reviewing the evidence from
©202109 Acoustical Society of America. All rights reserved. volume 15S,pirsisnuge202|0S, uSpmemciaerl I2s0s1u9e | Acoustics Today | 411
https://doi.org/10.1121/AT.2019.15.2.11
Reprinted from volume 15, issue 2