Fall 2011

Page 13 - Fall 2011

P. 13

Table 4: A partial list of vocal characteristics potentially contributing to voice recognition in humans perceptual or acoustic features for quality, or both—in other words, on approaches that use processing strategies that resemble those we apply to unfamiliar voices, with which we are considerably less adept. For example, many authors have proposed lists of descriptive terms to assess quality, and lis- teners typically measure quality by indicating the extent to which a voice possesses each feature (Voiers, 1964; Gelfer, 1988; Isshiki et al., 1969; Kempster et al., 2009). This approach (the only one currently available for quantifying quality), replete with redundancies and ambiguities, arises from 2000 years of tradition rather than from theory. Many of the features commonly in use today—for example, harsh, breathy, clear, bright, smooth, weak, shrill, deep, dull, and hoarse—can be traced to Roman writings on oratory (Table 3; Laver, 1981; Austin, 1806). Because assessing voices on such rating scales requires listeners to analyze a vocal pattern into component features, we might expect listeners to have a great deal of difficulty using such quality measurement pro- tocols, and in fact many studies have shown quite low levels of interrater agreement, as predicted (see Kreiman et al., 1993, for review). Nevertheless, quantifying voice quality is essential to many endeavors, including studying the efficacy of treat- ments for voice disorders or the acceptability of speech syn- thesis efforts. This leaves us with the following problem: How do we quantify an unanalyzable pattern? One solution under investigation (Gerratt and Kreiman, 2001; Kreiman et al., 2007) is the use of an analysis-by-synthesis approach in which voices are copied using a voice synthesizer specialized for replicating variations in voice quality. Because the com- plete voice pattern is copied exactly, the synthesizer parame- ters explicitly link a range of selected features of the acoustic signal to the overall, integral pattern, and can thus be used validly as objective acoustic indices of subjective perceptual responses. Because this method allows us to study how lis- teners manage the interplay between features and patterns, it allows for applicability to both familiar and unfamiliar voic- es and holds the promise of elucidating their distinctive dynamic processing characteristics. The larger universe of perceptual judgments Speakers make judgments regarding physical, psycho- logical and social characteristics from voice that go well beyond mere speaker identity, and we are only beginning to understand the range of information conveyed and the man- ner in which such information is extracted and exploited. For example, the emotional and attitudinal nuances conveyed by voice may well number in the thousands; and many animals (including possibly humans) are adept at extracting informa- tion related to reproductive fitness from vocal signals (e.g., Hardouin et al., 2009; Charlton et al., 2007; Apicella and Feinberg, 2008). Thoughtful examination of everyday talk reveals an immense set of possible judgments listeners may make (Table 4). This is not an exhaustive list, but is intended to point to the potentially large constellation of characteris- tics that underlie functional voice perception. It becomes clear that a systematic reductionist approach to the study of voice perception in the face of these many variables is unre- 12 Acoustics Today, October 2011

11 12 13 14 15