Page 25 - Fall 2011
P. 25

 HUMAN VOICE IN EVOLUTIONARY PERSPECTIVE Michael J. Owren Department of Psychology, Georgia State University Atlanta, Georgia 30302  Introduction “...a flurry of recent, exciting cavities located above the larynx, which make up the supralaryngeal vocal tract. Resonances of these cavities are referred to as formants, and shape the spectral characteristics of the source energy in accordance with their input-output rela- tion. The overall effect is often referred to as vocal-tract filtering, and has long been fundamental to understanding human speech production (Chiba and The human voice is a remarkable, multi-faceted instrument that has been studied and discussed by scholars throughout recorded history. Modern scientific study has revealed much about its fundamental proper- ties, such as the physics and physiology of vocal-fold action, the causes and consequences of vocal impairment, and the rich, varied articulatory maneuvers used among the world’s many languages. While inquiry has typically been prompted by issues concerning speech communication or vocal performance, work on vocalization in nonhumans is inspiring new questions and insights about the voice from an evolutionary perspective. A major goal in this approach is to understand how and why the human voice has come to have its current, particular form. The premise is that the basic biological forces shaping vocalization in other species have also been important in humans—creating basic com- monalities that arguably transcend the many obvious differ- ences that exist between human and nonhuman communi- cation. This article is intended as an introduction to some of the issues that arise in understanding the voice in evolutionary terms. The source-filter model of vocalization will be central throughout, explaining vocal production as a combination of laryngeal energy and vocal-tract resonance. While originally developed in speech science, it is now widely applied to non- human vocalization as well. Indexical cuing is a second underlying theme, referring to acoustic aspects of the voice and vocal signals that are correlated with important vocaliz- er characteristics such as sex, identity, age, and emotional state. Both source-filter production and indexical cuing are deeply rooted in the phylogeny of human vocalization, which becomes clear in reviewing our species’ mammalian and pri- mate pasts. Commonalities are especially clear in sex and identity cuing, with sex differences in vocal anatomy and acoustics in particular having inspired a flurry of recent, exciting studies connecting cues from pitch and resonance to vocalizer fitness and reproductive success. Source-filter theory Understanding the voice in comparative perspective begins by examining the physical characteristics of the vocal tract, important features of which are illustrated for humans and nonhuman primates in Fig. 1. Two critical components can be distinguished. First, the source energy of vocalization is derived from laryngeal, vocal-fold vibration driven by air flowing from the lungs (phonation), or by creating turbulence in the flow by forcing it through a constriction or onto a sur- face within the tract. In both cases, this source energy excites 24 Acoustics Today, October 2011 studies connecting cues from pitch and resonance to vocalizer fitness and reproductive success.” Kajiyama, 1941; Fant, 1960; Stevens, 2000). Over the last two decades, however, this two-component, source-filter approach to vocalization has been applied to an ever-increas- ing range of nonhuman species as well (Taylor and Reby, 2010). The process involved in producing a complex, tonal sound is also illustrated in the figure using naturally occur- ring vocalizations from a human male and a female rhesus monkey (Macaca mulatta). Each sound is produced by put- ting the vocal folds in regular, or quasi-periodic, vibratory motion. As the folds are forced apart and come back togeth- er, bursts of air emanate from the glottis, which is the open- ing between the folds. The frequency spectrum of glottal air- flow exhibits most energy at the fundamental frequency (F0), or base rate of vibration, with energy at corresponding high- er harmonics declining exponentially with increasing fre- quency. The cavities and tissues of each species’ supralaryn- geal vocal tract can strongly shape glottal waveform compo- nents through resonance and anti-resonance effects, which respectively reinforce or damp energy in corresponding fre- quency regions. The filtering that results mirrors the sizes and shapes of the vocalizer’s supralaryngeal vocal-tract cavi- ties. In an adult human male, a relaxed, “neutral” vocal tract is modeled as a uniform, straight tube closed at the glottal end. It is composed of approximately equal-length pharyn- geal and oral cavities, with an overall vocal-tract length of about 17 cm measured from glottis to lips. The characteristic frequency spectra of resulting phonated sounds are marked by 4 to 5 prominent spectral peaks in the 0- to 5-kHz range, each of which reflects a formant. In a rhesus monkey, small- er vocal folds and a much shorter supralaryngeal vocal tract produce higher F0 values and formant frequencies, respec- tively. The pattern formed by these peaks can play a major role in determining the auditory quality of a given vocalization. Corresponding effects are routinely evident in many mam- mals, taking into account differences in overall vocal-tract length and characteristics of individual supralaryngeal cavi- ties. Due to coincidental resemblance to humans in F0 and vocal-tract length, for example, the chacma baboon (Papio cynocephalus ursinus) “grunt” call bears a remarkable resem- blance to an unarticulated, human vowel sound (Owren et 

   23   24   25   26   27