Fall 2011

Page 11 - Fall 2011

P. 11

Table 2: Some of the factors affecting listeners’ ability to identify an unfamiliar voice with the attended dimension (Melara and Marks, 1990; Li and Pastore, 1995). Similarly, studies using unfamiliar voices show that the harmonic and inharmonic (noise) parts of the voice interact perceptually, so that listeners’ sensitivity to either depends on energy levels in both (Kreiman and Gerratt, in press); and sensitivity to tremor rates in voice depends on the magnitude of the tremor, and vice versa (Kreiman et al., 2003). Further, listeners’ relative inability to reliably and consistently isolate single dimensions in a voice pattern is the largest source of error in voice quality ratings (Kreiman et al., 2007). These findings argue against reliance on feature-based models of voice quality of the sort that underlie most clinical voice evaluation protocols (about which more in a moment). As most studies of voice and voice quality perception use unfamiliar voices as stimuli, under- standing the functional and perceptual roles of auditory- acoustic cues or features in the perception of familiar voices has only been crudely begun (Van Lancker et al., 1985). These early attempts have shown that individual familiar voice patterns vary greatly in how (and how much) cues such as F0 or breathiness contribute to the recognition process. While familiar voice recognition engages pattern recogni- tion processes of the right hemisphere, discriminating among unfamiliar voices or “identifying” a voice heard only once or twice before (for example, in a voice lineup) engages auditory temporal receiving areas on both sides of the brain (Van Lancker et al., 1989), and seemingly involves both pattern recognition and featural analysis/matching skills. Error pat- terns in long-term memory tasks suggest that unfamiliar voic- es are encoded in terms of a generalized template or “proto- type,” along with a set of deviations from that prototype which are forgotten over time so that memory tends to converge on average-sounding voices no matter what voice was heard orig- inally (Papcun et al., 1989). Similarly, memory tests in change deafness studies (testing listeners’ awareness of abrupt voice quality changes during normal interaction) suggest that listen- ers remember only coarse differences between unfamiliar voices under normal circumstances (a “gist-based” representa- tion, Fenn et al., 2011, p. 1454), and that memory for specific acoustic details of a voice may be weak or entirely absent. In contrast, for familiar voices, a complex, unique perceptual pat- tern is stored along with an array of personally-relevant asso- ciations (appearance, biographical and episodic history, affec- tive nuances, and so on); recognition occurs within a second or two; and the “cues” triggering recognition vary widely with vocal pattern (Schweinberger et al., 1997a). These findings have led us to conclude that all voices are fundamentally pat- terns, and that pattern recognition and featural analysis recip- rocally operate, in different degrees, for all voice perception processes, depending on the status of the voice with respect to its familiarity to the listener. A large body of behavioral evidence also supports the notion that voices are best viewed as patterns. In a “repetition priming” protocol, listeners’ accuracy in judging whether or not a voice sample was famous improved when they had pre- viously heard a different sample of the target voice, so that the advantage transferred between tokens of speech and did not depend on the specific acoustic details of an individual sam- ple (Schweinberger et al., 1997b). Adaptation studies provide similar evidence. In these studies, the experimenter creates a stimulus continuum by “morphing” between two voices—for example, those of a male and a female. When listeners hear tokens taken from one end of the continuum, their judg- ments of ambiguous stimuli from the middle of the continu- um shift, so that hearing a relatively male sample 3 or 4 times makes the ambiguous sample sound more female, and hear- ing tokens from the female end of the continuum makes it sound more male. These effects have been shown for judg- ments of speaker identity (familiar voices: Zäske et al., 2010; trained to recognize: Latinus and Belin, 2011a), but also for Fig. 2. A fox and a hedgehog. 10 Acoustics Today, October 2011

9 10 11 12 13