Page 49 - Summer 2018
P. 49
can be recovered. The same is true for almost any casu- ally spoken sentence. In “I’ll talk to you later,” the final four syllables usually come out as a blur with vowels re- placed with mumbles (see Multimedia File 3, parts a and b, at acousticstoday.org/speech-not-acoustic), and in “This is as good as it gets,” some of the first few vowels are dropped entirely, leaving a sequence of buzzing “z” sounds for the listener to unpack as whole words (see Multimedia File 4, parts a and b, at acousticstoday.org/speech-not-acoustic).
Filling in the Gaps
The influence of nonacoustic factors like linguistic knowl- edge is especially noticeable, and helpful, when the speech signal is hard to hear. When individual words are completely masked by noise, a listener can still fill in the gaps to cor- rectly guess what was spoken; this has come to be known as perceptual restoration (Warren, 1970). Not only is percep- tion of the speech smooth and continuous, the listener ap- pears to mostly discard the perceptual details of the noise; if a coughing sound is made while a person hears a sentence, the timing of that cough cannot be reliably judged. In fact, people tend to estimate that it occurs at a linguistically rele- vant landmark (e.g., where you might put a comma or break between clauses in a sentence), even if its position was truly random. This observation is merely a seed of a larger pat- tern where people shape their perception of an utterance to match their framework of language.
“When You Hear Hooves,
You Expect Horses, Not Zebras”
Some words are spoken more frequently than others, and it comes as no surprise that they are more easily recognized, by both native and nonnative listeners (Bradlow and Pisoni, 1999). Listeners expect to hear words that have meaning. Consider a spoken utterance where the first consonant is ambiguous; was that first sound a /g/ or a /k/? If the mystery sound is followed by “iss,” then you are likely to hear it as a /k/, because “kiss” is a real word and “giss” is not. However, if the exact same sound is placed before “ift,” then you are more likely to hear it as a /g/, for just the same reason. This pattern is illustrated in Figure 5 and can be heard in (see Multimedia File 5, parts a through g, at acousticstoday.org/speech-not-acoustic). These scenarios again highlight how nonacoustic factors can play a role when we perceive speech. The acoustics cannot be said to solely drive the perception because the same consonant sound was used in each case. Only the surrounding context and intuition about likely word meaning can explain the bias. This particular lexical bias ef- fect, commonly known as the “Ganong effect” for the author of a
Figure 5. The “Ganong effect.” When labeling a consonant that is morphed between /g/ and /k/, there will be more perceptions of /g/ when the consonant is followed by a syllable like “ift” be- cause “ift” is a real word. If the same sound is followed by “iss” then more items in the continuum will be labeled as /k/. Per- ception is driven not just by acoustics but also by the lexicon.
study (Ganong, 1980) who first described it, is observable mainly when the speech sound is ambiguous and not so much when it is spoken normally. However, as discussed below, the underly- ing processes related to using context might be active earlier and more frequently than what shows up in basic behavioral tests.
The pattern of biasing toward hearing things that are frequent or meaningful extends down to the level of individual speech sounds, which can come in sequences that are more or less common. Listeners can capitalize on likelihood and structure of words in their language to shape what they think they hear. For example, there are more words than end in the “eed” sound than the “eeg” sound, so if a listener is unsure when the final consonant was a /d/ or a /g/ but was certain that the preceding vowel was “ee” (transcribed phonetically as /i/), then the listen- er could be more likely to guess that the final sound was a /d/.
It can be tempting to interpret nonacoustic effects like per- ceptual restoration or lexical bias as a post hoc refinement of laboratory behavior rather than a real perceptual phenom- enon. However, in addition to using linguistic knowledge to fill in gaps, people also appear to make predictions about upcoming speech that might or might not even be spoken. When the audio of a sentence is cut short before the end, listeners can still reliably say what will be spoken (Grosjean, 1980), and can shadow a talker’s voice at speeds that seem impossible to explain by hearing alone (Marslen-Wilson, 1973), suggesting that they quickly generate predictions about what will be heard before the sounds arrive at the ear.
One of the most interesting ways to learn about speech and language perception is through the use of eye-tracking stud- ies that follow a listener’s gaze as the person listens. Such experiments demonstrate that listeners rapidly and incre- mentally perceive the speech signal and act on even the most
Summer 2018 | Acoustics Today | 47