Fall 2011

Page 28 - Fall 2011

P. 28

become difficult, if not impossible, for adults to produce. Instead, vocal-fold behavior appears to become more stable, centered on regular, synchronized vibration and associated harmonically-structured sounds. In fact, the vocal gymnastics of infants and children would constitute vocal abuse in adults, for whom chronic shouting or screaming can induce vocal- fold nodules and other pathologies (Stemple et al., 2009). Suggestive evidence along these lines is also provided by a recent comparison of tickle-induced laughter in great apes and humans. While all five species produced distinctive-sounding laughter sounds, humans stood out from the others in showing significantly greater regularity in underlying vocal-fold action (Davila Ross et al., 2009). A speculative but logical inference is that human vocal folds show evolutionary modification for more stable response across a range of air pressures and mus- cle tensions. While arguably losing some flexibility in laryngeal response, adult human voices have become less prone to non- linear phenomena. That change has created a requisitely high- er proportion of regular, well-synchronized phonation, which in turn may have promoted the effectiveness of source-filter- based indexical cuing. Indexical cuing in the voice Source-filter theory, laryngeal nonlinearity, and the simi- larities as well as differences between humans and other mam- mals create the foundation for understanding vocal indexical cuing. In a sense, all vocalizations must be considered inher- ently indexical, for instance in simply showing that a vocalizer is present. However, the more important consideration is how indexical cues are affected by the acoustics of a given vocaliza- tion. The indexical potency of harmonically structured sounds, in particular, is clearly evident from everyday experi- ence alone. Here, the pitch and timbre of phonated speech allow listeners to immediately discern a talker’s sex, identity, approximate age, and other personal characteristics. These capabilities are traceable to inherent differences in vocal-tract characteristics both among age-sex classes—such as adults versus children and males versus females—and among indi- viduals within each group. For example, phonation allows even potentially subtle differences in vocal-fold size, shape, and tis- sue properties to be revealed in features such as F0, relative noisiness of the glottal signal, and cycle-to-cycle variation in vibration. Thus, humans tested with male versus female voices require fewer than two waveform cycles—each corresponding to a single opening and closing of the glottis—to hear the dif- ference (Owren et al., 2007). Supralaryngeal filtering also con- tributes strongly to indexical cuing, even as talkers are dynam- ically altering the pharyngeal and oral cavities for linguistic purposes. Even brief segments of recorded vowel sounds show that details of formant patterning can provide important potential cues to both sex and individual identity (Bachorowski and Owren, 1999). However, indexical cuing can be strongly affected by the nature of the source energy involved. As shown in Fig. 3 for male and female speech, for example, supralaryngeal cues become less evident as F0 increases. This effect occurs because harmonics occurs at integer multiples of F0 and rais- ing this basic rate of vibration spaces them further apart. The source spectrum thereby becomes more sparsely populated, with less opportunity for supralaryngeal resonances to create a distinct imprint. Another way to understand this outcome is that formants become less well “sampled” by the source sig- nal, giving the listener less to go on in recovering details of frequency, bandwidth, and amplitude. Some formants may not be sampled at all when F0s become very high. Adding some noisiness to otherwise stable vocal-fold vibration can improve the situation, for instance by “filling out” the source spectrum. That effect occurs in breathy phonation in human talkers, as well as in the noisy, but nonetheless regularly phonated “roars” of red deer (Cervus elaphus) and other mammals (Taylor and Reby, 2010). But too much noisiness becomes a liability. Reducing the source energy of speech to noise alone—as in whispering— makes both phonetic and indexical cuing less effective (Tartter, 1991; Katz and Assman, 2001). Deterministic chaos is nonetheless by far the greatest challenge to supralaryngeal cuing. As a general phenomenon, the occurrence of nonlin- earity in a voice has been suggested contribute to individual identity signaling (Fitch et al., 2002). Such events might, for example, occur idiosyncratically in particular vocalizers and thereby become compelling cues to their respective identi- ties. Nonlinear vocal phenomena are by nature unstable, however, and therefore not likely to provide as consistent a substrate for indexical cuing as vocalizer-specific vocal-fold properties or supralaryngeal filtering (Rendall, 1996; Owren and Rendall, 2001). Furthermore, informal examination of a variety of chaos-based screams suggests that virtually no source- or resonance-related indexical cuing occurs in such sounds—no matter what species they are from (see Fig. 4). Empirically, direct comparisons of identity signaling in rhe- sus monkey and baboon vocalizations have shown that har- monically structured sounds are a markedly better vehicle. Fig. 3. Narrowband spectrograms of a human female (top) and male (bottom) say- ing the words “this is my voice.” The lower pitch and resonance in male voices makes formants more distinct and easier to measure than in female voices. Human Voice in Evolutionary Perspective 27

26 27 28 29 30