Page 12 - Fall 2011
P. 12

perception of emotion (Bestelmeyer et al., 2010), speaker sex (Schweinberger et al., 2008), speaker age (Zäske and Schweinberger, 2011), and ratings of roughness (Gerratt et al., 1993) from voice, and are interpreted as reflecting adap- tation of a central representation (a pattern), rather than the effects of specific acoustic characteristics of the stimuli. Finally, studies of familiar voice recognition (e.g., Van Lancker et al., 1985) have demonstrated that the acoustic cues to personal identity vary from voice to voice, and the importance of a given cue depends on the context of the complete voice pattern in which that cue operates, and not on the value of the cue itself. Thus, unusual pitch contours or a marked foreign accent (for example) may be essential cues to a speaker’s identity, or not, depending on the other cues that are available to listeners. It is thus impossible to devise a set of features that are important for recognition of all voices: The importance of a given cue depends on the pattern in which the cue appears and on the status of the voice as famil- iar—and stored as a personally relevant auditory object—or unfamiliar and handled perceptually in terms of stereotypes or generalized templates. One final difference between familiar voice recognition and unfamiliar voice discrimination is that familiar voice pat- terns are remarkably robust, so that we can recognize a famil- iar voice in noise, based on very short samples (often just the word “Hi” on a band-limited telephone line), even when the voice has not been heard for years or even decades and has changed with time (voices appear to change less with age than do faces). In con- trast, virtually anything will disrupt efforts to match an unfamiliar voice to a decaying memory trace. Studies (primarily focusing on forensic situations) have shown that identification scores fluctuate as a function of a wide range of fac- tors characterizing the speaker, the listener, and the circumstances sur- rounding originally hearing and subsequently identifying the voice, (Table 2; see Bricker and Pruzansky, 1976, or Kreiman and Sidtis, 2011, for review). It appears that the greater the reliance on fea- tural extraction, comparison, and analysis, the worse we are at the task. Features and patterns: A “fox and hedgehog” model for voice recognition Taking an idea from the essay of Isaiah Berlin (1953) on Archilochus’ fable about a fox and a hedgehog (Fig. 2), we have pro- posed a model of voice perception that suggests voices can be recognized by varying applica- tions of featural and pattern recognition processes. In the fable, the fox knows many little things while the hedgehog knows one big thing. There are many versions of the bipo- larity expressed in this adage: empiricism contrasted with rationalism, Aristotle meets Plato, behaviorism compared with the sweeping ideologies of cognitive science, agility of thought versus persistence (Gould, 2003). In our model of voice perception, the aphorism is meant to represent the interplay between features and patterns in the speaker-lis- tener interface. Some voices and some voice perception tasks draw more heavily on features (many little things), while other voices and other tasks utilize pattern recognition abil- ities more heavily. This counterpoint helps elucidate the respective roles of unfamiliar and familiar voices, in that fea- tural elements figure importantly in the discrimination of unfamiliar voices (in the sense of matching to generalized templates), while overall pattern recognition predominates for familiar voices (in accessing unique auditory percepts). Measuring voice quality We have argued thus far that humans are good at famil- iar voice recognition because we have inherited this ability through our evolutionary past, and that familiar voices are best treated as integral patterns. Nevertheless, most approaches to voice quality assessment depend on the use of  Table 3: A few examples of terms for voice quality, from a long history of interest in such descriptors.  Voices and Listeners 11 

   10   11   12   13   14