Page 42 - Special Issue
P. 42

Early Talking Automata
 Figure 1. Midsagittal section of the vocal tract from the first color anatomy book of the head and neck. All of the structures of the respiratory, oral, and nasal tracts are accurately labeled in exquisite detail, including the trachea (x), vocal folds (85), jaw (p), tongue (65), palate (54), velum (45), and lips (L, M). The flow of air from the trachea through the vocal folds into the oral and nasal cavities was clearly understood as was the vibration of the vocal folds; the shaping of the oral cavity by the jaw, tongue, and lips; and the action of the velum in closing off the nasal cavity during speech. Reproduced from Gautier d'Agoty and Duverney (1745).
these apparently contradictory viewpoints, Ferrein himself, and later Vicq d’Azyr (1779), concluded that the vibration of the vocal folds, the shape of the glottis, and the glottal airflow could not be meaningfully separated and were all responsible for sound generation, in many respects predict- ing the modern myoelastic-aerodynamic theory of vocal fold vibration (van den Berg, 1958). By the middle of the eighteenth century, the analogy between the vocal tract and a very special kind of musical instrument was no longer in doubt. The open issue was how to “play” the vocal instrument to produce speech.
Mechanical Reproduction of the Voice
It took until the late eighteenth century for all of these early ideas by Mersenne, Dodart, and Ferrein to be fully explored and implemented. The basic component mechanisms under- lying speech production were by now understood: a pair of lungs to create an aerodynamic flow, a pair of vocal folds
142 | Acoustics Today | Suprminmge2r0201,9Special Issue Reprinted from volume 15, issue 2
vibrating under tension and blown by the glottal air flow to create sound, and a tube shaped like the vocal tract to form sound into speech.
Mechanical analogs were proposed, drawing again on com- parisons with musical instruments: a pair of bellows for the lungs, a vibrating reed or membrane for the vocal folds, and organ pipes for the mouth and nose. Only the control mechanism and the confidence that a mechanical speaking machine could actually be built were lacking. These were provided by Vaucanson (1738), who constructed an automa- ton flute player that played tunes by blowing into a real flute. Drawing on a long history of mechanisms used in musical clocks and chamber organs (cf. Kircher, 1650), dating back to before the middle ages, Vaucanson ingeniously employed a revolving cylinder studded with pins to coordinate the timing and activation of a set of levers moving the articulators of his automaton, leaving physics to do the rest. Generalizing the same idea, Engramelle (1775) later published a mono- graph detailing how individual musical performances could be systematically transcribed onto pinned cylinders, as in a modern music box, and used to drive a mechanical organ for playback. These were the first examples of programmable musical instruments and also the first examples of musical automata designed to reproduce the actions of human musi- cians. It did not escape the imagination of contemporaries of both Vaucanson and Engramelle that the same mechanism could also be used to synthesize human speech (Doyon and Liaigre, 1966; Séris, 1995).
Kratzenstein’s Vowel Tubes and Kempelen’s Speaking Machine
ThefirstinstantiationofMersenne’soriginalproposalappearedin 1780, when Christian Gottlieb Kratzenstein, a professor in Copen- hagen, won first prize for a competition proposed by Leonhard Euler at the Imperial Academy of St. Petersburg in 1777. Euler asked whether it might be possible to construct a set of organ pipes similar to the traditional vox humana stops, which would perfectly imitate the vowels a, e, i, o, and u. Kratzenstein (1780) responded by making five tubes of metal and wood (Figure 2) that he shaped by trial and error to produce approximations of the different vowel sounds when blown with a free reed. Notably, none of these bore any recognizable resemblance to the shape of
an actual vocal tract.
At around the same time, Wolfgang von Kempelen spent 20 years making several attempts to create a mechanical speaking

   40   41   42   43   44