Page 13 - Summer2019
P. 13
Mechanical Speech Synthesis in
Early Talkmg Automata
Gordon I. Ramsay Early attempts at syrrthesizingspeech using mechanical models afthe vocal tract
Addrm: prefigure modern embodied theories of speech production.
Spoken Communication Laboratory I
Marcus Autism Center '""°d"°“°"
1920 Bflamufikmd NE Three centuries of scientific research on speech production have seen significant
Atlanta) Georgia 30329 progress in understanding the relationship between articulation and acoustics in the
USA human vocal tract Over this period, there has been a marked shift in approaches to
A experimentation, driven by the emergence of new technologies and the novel ideas
Emmi: these have stimulated. The greatest advances during the last hundred years have
gmd°n‘mm‘“Y@em°rY'ed“ arisen from the use of electronic or computer simulations of vocal tract acoustics
for the analysis, synthesis, and recognition of speech. Before this was possible, the
focus necessarily lay in detailed observation and direct experimental manipulation
of the physical mechanisms underlying speech using mechanical models of the vocal
tract, which were the new technology of their time. Understanding the history of the
problems encountered and solutions proposed in these largely forgotten attempts
to develop speaking machines that mimic the actual physical processes governing
voice production can help to highlight fundamental issues that are still outstanding
in this field. Many recent embodied theories of speech production and perception
actually directly recapitulate proposals that arose from early talking automata.
The Voice as a Musical Instrument
By the beginning of the seventeenth century, the anatomy of the head and neck was
already well understood, as witnessed by the extraordinarily detailed illustrations
found in many books of the period (eg., Casserius, 1600). An example of a mid-
sagittal cross section of the vocal tract from the first anatomy textbook published in
color (Gautier d'Agoty and Duverney, 1745), correctly reproducing all of the major
anatomical structures, is shown in Figure 1. However, the exact function of the
many different structures within the vocal tract and the origin of the human voice
were still an active topic of discussion. From the earliest definition of the science
of acoustics in the landmark article by Sauveur (1700) and even before, analogies
were drawn between speech and music that drove much of the debate.
The first clear understanding that the geometry of the vocal tract directly shapes the
timbre of speech was published by Marin Mersenne in his book Harmonie Universelle
(Mersenne, 1636). In the sixth volume of that remarkable tome, Proposition XXXVI
“explains how to construct a set of organ pipes, to pronounce vowels, consonants,
syllables, and utterances,” correctly inferring that appropriately manipulated tube
shapes excited by a reed would produce corresponding speech sounds. Later, the
focus shifted to the function of the larynx, with much heated argument about how
the vocal folds were able to create sound. Dodart (1700) proposed that the glottis acts
as a wind instrument, blown by air flowing over the edges of the hole between the
vocal folds, whereas Ferrein (1741) claimed instead that the vocal cords vibrate like
a string instrument, bowed by the air from the lungs. Reviewing the evidence firom
©2019 Acoustical sot-my 1:] America. All rights reserved. valume 15, usuzz 1 Summer 21:19 | Asa:uIH|:I‘1'b:lIy [ 11
littpx://doiurg/10.1ill/;\I2|.|l9.15.2 u