Page 48 - Fall2019
P. 48

Sensory Modality and Speech Perception
The more we understand about the perceptual brain, the more it seems agnostic about sensory modality. Brain areas once thought dedicated to a single sense are now known to react to multiple senses (for a review, see Rosenblum et al., 2016). Many who study multisensory perception now believe that the perceptual brain is more accurately characterized as being designed around tasks and behav- ioral function than around individual sensory systems (e.g., Reich et al., 2012). The research supporting this new conception comes from multiple areas of behavioral and neurophysiological perceptual science. However, much of what has come to be known as the Multisensory Revolution (e.g., Rosenblum, 2013) has been motivated by research on speech perception. The aforementioned research on Tadoma and felt speech has been part of this endeavor. But much more of this work has addressed our more usual way of perceiving speech: via audiovisual means.
We All Lipread
Research shows that regardless of our hearing, we use visible speech (lipread) information when it is available. We use visible speech to enhance our perception of auditory speech that is degraded by background noise (e.g., Bernstein et al., 2004b) or a heavy foreign accent (Arnold and Hill, 2001). We use visual speech as we acquire our first language(s) (e.g., Teinonen et al., 2008) and second languages (e.g., Hazan et al., 2005). In fact, not having access to visual speech during language development causes predictable delays for blind children, the remnants of which can be observed in adult- hood (e.g., Delvaux et al., 2018).
Perhaps the most compelling demonstration of audiovi- sual speech is the McGurk effect (McGurk and MacDonald, 1976). There are myriad examples of the effect online (e.g., illusionsindex.org/i/mcgurk-effect and acousticstoday.org/
speech-not-acoustic). In one example, a video of a face articulating the syllables “ba,” “ga,” “va,” and “la” is syn- chronously dubbed with an audio recording of the repeated syllable “ba.” Observers asked what they hear typically report “ba,” “da,” “va,” and “tha” despite their ears receiving a clear “ba” four times. Thus, it seems that what we hear can be strongly affected by what we see.
I have been demonstrating the McGurk effect in this way to my classes for over 30 years. Still, the effect works on me as well as it ever has. Indeed, research shows that the effect works regardless of one’s awareness of the audiovisual dis- crepancy (e.g., Bertelson and de Gelder, 2004). The effect
also works in different languages (e.g., Sams et al., 1998) when there are extreme audio and visual stimulus degrada- tions (e.g., Rosenblum and Saldana, 1996) as well as across observers of different ages and perceptual experience (e.g., Jerger et al., 2014; but see Proverbio et al., 2016). There are certainly individual differences in the strength of the effect depending on, for example, the involved segments (e.g., related to native language). Still, the vast majority of (neu- rologically typical) individuals show some form of the effect.
One of the more interesting aspects of the McGurk effect is how visual speech can influence what one experiences hearing. This phenomenology corresponds to the neurophys- iology of visual speech perception. Seeing an articulating face can induce activity in auditory brain areas, even for novice lipreaders (e.g., Calvert et al., 1997; see Rosenblum et al., 2016, for a review). In fact, visual speech was the first stimulus to show cross-sensory activation of a primary sen- sory brain area in humans. Visual speech can also modulate more upstream (earlier) auditory mechanisms (Musacchia et al., 2006; Namasivayam et al., 2015). For audiovisual McGurk stimuli, a visual syllable “va” synchronized with an auditory “ba” induces auditory brain area activity consistent with the activity from hearing an auditory “va” (Callan et al., 2001). Based on this neurophysiology, it is not surprising that observers experience “hearing” what they are seeing.
The McGurk effect is one of the most studied phenomena in modern perceptual psychology. However, some of us have recently questioned its use as a tool to measure the strength of multisensory integration (for reviews, see Alsius et al., 2018; Rosenblum, 2019). There is strong evidence, for example, that when the McGurk effect appears to fail (a perceiver reports just hearing the audio component), dimensions of the channels are still integrated (e.g., Bran- cazio and Miller, 2005).
Still, the effect is useful for simply establishing that inte- gration has occurred, and the effect can occur in some very surprising ways. Consider the aforementioned speech perception by touch. Research shows that touching a face articulate syllables while listening to different syllables can make the heard syllables “sound” like those being felt (Fowler and Dekle, 1991). Relatedly, a brief puff of air applied to an observer’s skin (on the neck or arm) can integrate with synchronized heard syllables to make a “ba” sound more like a “pa” (e.g., Gick and Derrick 2009). In another touch example, a heard vowel (“ea” in “head”) can
48 | Acoustics Today | Fall 2019























































































   46   47   48   49   50