Page 50 - Summer 2018
P. 50

Speech: Not So Acoustic
subtle bits of information as soon as the sound arrives at the ear. For example, when hearing “The man has drunk the ...,” an observer will look at an empty glass but instead will look at a full glass when hearing “the man will drink the...” (Alt- mann and Kamide, 2007). These actions make sense when you stop and think about the meaning of the words, but no such deliberate reckoning is needed because the eye gaze shifts are rapid and occur much earlier than the subsequent words like “water” or whatever else should finish the utter- ance. This is yet another example of how perception of a word (“water”) is not just about receiving the acoustic pres- sure waves but can also be shaped dramatically by factors that have nothing to do with the acoustics of the word itself.
Prediction of upcoming words is not merely a neat trick that emerges in the laboratory; it is the foundational principle of entire frameworks of speech perception theories (see Lupy- an and Clark, 2015). It is why individuals can complete our friends’ and spouses’ sentences and why one can expect a big play when a sports announcer’s voice begins to swell with ex- citement. Brain-imaging studies have validated the idea of speech perception as a process of continual prediction and er- ror correction rather than a straightforward encoding of the incoming signal. Skipper (2014) shows that there is actually metabolic savings afforded by the use of context, contrary to the idea that computing context is a costly extra processing layer on top of auditory sensation. He goes on to say that the majority of what we “hear” during real-world conversation might come not from our ears but from our brain.
The study of speech acoustics has demanded creativity and collaboration among a variety of experts spanning multiple fields of study, including acoustics and beyond. There is so much literature on the topic that it is easy to lose track of a reality that is perhaps more obvious to a person who does not study speech communication: speech is not nearly as acoustic as one might think. Speech has been and will con- tinue to be driven in large part by studies of the sounds of the vocal tract and the auditory-perceptual mechanisms in the ear that encode those sounds. It is undeniable that the quality of the speech signal itself plays a large role in our per- ception; just ask anyone who has hearing loss. However, by recognizing the nonacoustic factors involved in speech per- ception, one might better understand why computers don’t recognize speech as well as humans; despite hyperspeed de- tailed analysis of the acoustic signal, only part of the infor- mation is in the signal, and the rest lies elsewhere, either in the environment, on the face of the talker, in the statistics of the language, or, more likely, in the mind of the listener.
48 | Acoustics Today | Summer 2018
Akeroyd, M. (2008). Are individual differences in speech reception related to individual differences in cognitive ability? A survey of twenty experi- mental studies with normal and hearing-impaired adults. International Journal of Audiology 47(Suppl. 2), S53-S71.
Allen, J. (1995). Consonant recognition and the articulation index. The Journal of the Acoustical Society of America 117, 2212-2223.
Altmann, G., and Kamide, Y. (2007). The real-time mediation of visual attention by language and world knowledge: Linking anticipatory (and other) eye movements to linguistic processing. Journal of Memory and Language 57, 502-518.
Babel, M., and Russell, J. (2015). Expectations and speech intelligibility. The Journal of the Acoustical Society of America 137, 2823-2833.
Bradlow, A., and Pisoni, D. (1999). Recognition of spoken words by native and non-native listeners: Talker-, listener-, and item-related factors. The Journal of the Acoustical Society of America 106, 2074-2085.
Fletcher, H., and Galt, R. (1950). Perception of speech and its relation to telephony. The Journal of the Acoustical Society of America 22, 89-151.
Fowler, C., and Dekle, D. (1991). Listening with eye and hand: Cross-modal contributions to speech perception. Haskins Laboratory Status Report on Speech Research SR-107/108, 63-80.
Ganong, W. F. (1980). Phonetic categorization in auditory word perception. Journal of Experimental Psychology: Human Perception and Performance 6, 110-125.
George, E., Zekveld, A., Kramer, S., Goverts, T., Festen, J., and Houtgast, T. (2007). Auditory and nonauditory factors affecting speech reception in noise by older listeners. The Journal of the Acoustical Society of America 121, 2362-2375.
Green, K., Kuhl, P., Meltzoff, A., and Stevens, E. (1991). Integrating speech in- formation across talkers, gender, and sensory modality: Female faces and male voices in the McGurk effect. Perception & Psychophysics 50, 524-536.
Grosjean, F. (1980). Spoken word recognition processes and the gating par- adigm. Perception & Psychophysics 28, 267-283.
Hay, J., and Drager, K. (2010). Stuffed toys and speech perception. Linguis- tics 48, 865-892.
Herman, R., and Pisoni, D. (2003). Perception of “elliptical speech” follow- ing cochlear implantation: Use of broad phonetic categories in speech per- ception. Volta Review 102, 321-347.
Humes, L. E. (2002). Factors underlying the speech-recognition perfor- mance of elderly hearing-aid wearers. The Journal of the Acoustical Society of America 112, 1112-1132.
Humes, L. E. (2007). The contributions of audibility and cognitive factors to the benefit provided by amplified speech to older adults. Journal of the American Academy of Audiology 18, 590-603.
Kuhl, P., and Meltzoff, A. (1982). The bimodal perception of speech in in- fancy. Science 218, 1138-1141.
Labov, W., and Ash, S. (1997). Understanding Birmingham. In C. Bernstein, T. Nunnally, and R. Sabino (Eds.), Language Variety in the South Revisited. University of Alabama Press, Tuscaloosa, AL, pp. 508-573.
Lupyan, G., and Clark, A. (2015). Words and the world: Predictive coding and the language-perception-cognition interface. Current Directions in Psychological Science 24, 279-284.
Marslen-Wilson, W. (1973). Linguistic structure and speech shadowing at very short latencies. Nature 244, 522-523.
Massaro, D., and Cohen, M. (1983). Phonological context in speech percep- tion. Perception & Psychophysics 34, 338-348.
McGurk, H., and MacDonald, J. (1976). Hearing lips and seeing voices. Na- ture 264, 746-748.
Miller, G., and Nicely, P. (1955). An analysis of perceptual confusions among some English consonants. The Journal of the Acoustical Society of America 27, 338-352.

   48   49   50   51   52