Summer 2018

Page 46 - Summer 2018

P. 46

Speech: Not So Acoustic
both frequency and timing information. Hearing thresh- olds, combined with factors related to aging, can certainly explain a substantial amount of performance observed in the laboratory and clinic (Humes, 2002; Akeroyd, 2008). To- ward an understanding of the essential speech components necessary for the transmission of speech across telephone lines, Fletcher and Galt (1950) divided the sound spectrum into frequency bands and measured the contribution of each band to intelligibility. Deconstructing the signal into its most essential components is reminiscent of other sciences and has proven valuable. Similar approaches were taken by Miller and Nicely (1955) and revisited by Allen (1995) to un- derstand how the energy in the signal is recovered and used by the listener in very specific ways. These principles are still used today in audiology clinics where the audibility of differ- ent speech frequencies are used to estimate the performance of a person being fit with hearing aids.
Despite the intuitiveness and utility of explaining speech communication based solely on acoustic audibility, we get more information than acoustics alone provides. Some re- ports claim that the variance in ability to perceive speech is not only affected but also mainly affected by nonauditory abilities (George et al., 2007). Other studies place the contri- bution of nonauditory information at 30% to 50% (Humes, 2007). In an influential report collecting observations across many previous studies, Massaro and Cohen (1983) sug- gested that we embrace the integration rather than force the separation of auditory and nonauditory streams. As a con- sequence, it is clear that one must give attention to contex- tual factors when examining the acoustic properties of the speech signal.
Talker Familiarity
It will come as no surprise that it is easier to understand a talker who speaks your particular dialect (Labov and Ash, 1997), and the benefits scale to the relative distance of dia- lect from your own (Wright and Souza, 2012; also see the article by de Jong in this issue of Acoustics Today). Further- more, there are special benefits when listening to a longtime partner or spouse whose speech is measurably more intel- ligible to the partner than to strangers (Souza et al., 2013). Talker familiarity can yield benefits even on very short tim- escales because sentences are more intelligible when preced- ed by sentences spoken by the same talker (Nygaard et al., 1995). Notably, the counterbalancing of talkers and listeners in these studies demonstrates that acoustic factors cannot explain the effect. Instead, some other property of the lis- tener or the relationship history, such as knowing a person’s
dialect, the funny way “water” is said, or the words typically used, can make a substantial difference.
Apart from the intuitive advantage of knowing who’s talk- ing, there are some rather unusual and surprising effects of context. Simply changing the expectation of what a talker sounds like can affect how speech is perceived, even if the acoustics have remained unchanged. Intelligibility is poorer when utterances are thought to be produced by a person who is not a native speaker of one’s own language, even if the sound has not been changed (Babel and Russell, 2015). We routinely accommodate the acoustic difference between a woman’s and a man’s voice by, for example, expecting higher or lower frequency sounds, and this accommodation can be induced if the listener simply sees the talker’s face (Strand and Johnson, 1996), especially if the listener has a hearing impairment (Winn et al., 2013).
Indeed, even seemingly unrelated objects in the environ- ment, like stuffed animals, can affect speech perception! Hay and Drager (2010) conducted a clever vowel-perception ex- periment that hinged on a listener’s knowledge of the differ- ences in dialect between Australian and New Zealand vari- eties of English. Imagine a vowel sound that is intermediate between the sounds that Americans would use for the vow- els in the words “head” and “hid”; it would be a word that could fall either way. This vowel sound would be heard in the word “hid” in Australia but be perceived as “head” in New Zealand. In an experiment where stuffed toys were placed conspicuously in the room, the listeners were more likely to hear this ambiguous vowel as Australian “hid” when the toy was a kangaroo but were biased toward hearing “head” (a New Zealand interpretation) when the toy was a kiwi bird. The toy primed the listener to match the sound to the ap- propriate dialect that would be associated with the animal despite no difference in the acoustic stimulus.
Visual Cues
Visual cues are used by all sighted listeners, not just those with hearing loss. The complementarity of the ears and eyes in speech perception is remarkable. Speech sounds that are most acoustically similar (think of “f” and “th” or “m” and “n”) are reliably distinguished visually (see Figure 1). The reverse is true as well. Even though it is nearly impossible to see the difference between “s” and “z” sounds, listeners almost never mistake this phonetic contrast (called voicing, referring to glottal vibration felt in the throat) even when there is significant background noise (Miller and Nicely, 1955). The learned association between the sounds and
44 | Acoustics Today | Summer 2018

44 45 46 47 48