Page 49 - Fall2019
P. 49
sound different (as “a” in “had”) if it is synchronously timed with the gentle pulling up of the skin at the corner of a lis- tener’s mouth (Ito et al., 2009).
Besides demonstrating that the speech brain readily inte- grates all relevant articulatory information regardless of modality, these touch examples help make another point. It seems that speech information can be integrated regard- less of one’s experience with the modality through which it is conveyed. Very few of us have experience touching faces for speech, extracting speech information from puffs on our skin, or having our mouth pulled as we listen to speech. Still, observers seem to readily integrate that novel information for perception.
In this sense, these examples may pose a challenge to probabilistic accounts of perception that assume that the likelihood of cue integration depends on probabilities derived from associative experience (e.g., Altieri et al., 2011). These accounts may also have a difficult time accounting for a very recent audiovisual example of the McGurk effect. Watching ultrasound videos of tongue blade movements can influence heard speech and induce brain responses charac- teristic of typical audiovisual speech integration (e.g., Treille et al., 2018). It seems that the speech brain is primed to integrate all types of information for speech articulation, even without prior associative experience between the infor- mation streams.
The Senses Share Their Experience
There is another context in which specific associative expe- rience may be unnecessary for the modalities to help one another: speech learning. As mentioned, new language learners benefit from seeing as well as hearing someone speak. This multisensory training benefit also extends to help our auditory comprehension when later just listening to the new language (e.g., Hazan et al., 2005). Multisensory stim- uli are also useful for training listeners with mild hearing impairments to better hear degraded speech (Montgomery et al., 1984).
Multisensory training also helps us learn to audibly recognize a talker’s voice (e.g., Schall and von Kriegstein, 2014). Thus, if you are having difficulty distinguishing talkers on your favorite podcast, research suggests that you would greatly benefit from watching them speak for a short period. A small amount of audiovisual experience would then enhance your ability to distinguish the talkers by hearing alone.
The multisensory training benefit also allows one to understand what a new talker is saying but in a particu- larly interesting way. It has long been known that listeners are able to better understand the speech of familiar versus unfamiliar talkers (for a review, see Nygaard, 2005). Predict- ably, this familiar talker advantage is even greater if one has audiovisual experience with the talker (e.g., Riedel et al., 2015). More surprising is that we are better able to lipread a familiar talker, even if our silent lipreading is not very good, and that familiarity is gained over just 30 minutes (e.g., Yakel et al., 2000). But even more surprising is that the experience one gets from silently lipreading a talker will then allow them to better hear that talker’s voice (Rosen- blum et al., 2007).
This is a particularly interesting instance of the multisen- sory training benefit. In this experiment, participants never experienced the talker bimodally; they never simultaneously saw and heard the talker speak. Instead, their familiarity with the talker through lipreading seemed to transfer across modalities, allowing them to then hear that talker better.
Related research shows that transfer of talker familiarity can also work in the opposite direction so that initial auditory experience with a talker makes them easier to lipread later on (Sanchez et al., 2013). Additionally, experience with rec- ognizing talkers in one modality can transfer to allow better recognition of those talkers in the other modality (Simmons et al., 2015).
How might talker experience transfer across modalities despite perceivers never having bimodal experience with the talker? It may be that perceivers are learning something about the talker’s articulatory style (e.g., idiolect). Because articulatory style can be conveyed audibly and visibly, learn- ing to attend to a talker’s idiosyncrasies in one modality may allow a perceiver to attend to, and take advantage of, those same idiosyncrasies in the other modality. This conjecture is based on a number of other findings.
First, despite our intuitions, talkers do look like they sound. This is apparent from research showing that perceivers can successfully match a talker’s voice to their (silently) articu- lating face, even if the face and voice are saying different
words (e.g., Lachs and Pisoni, 2004). Furthermore, perceiv- ers are able to perform these matches when the voice is reduced to a signal of simple sine waves and the face is reduced to a video of white points moving against a black
Fall 2019 | Acoustics Today | 49