Page 43 - Spring2022
P. 43
make the case to a broader audience in three subsequent publications (Yost, 1992, 1993, 2008). In these publications, Bill identifies three major areas of research making up the bulk of the work on human sound source perception, still active today. What follows is an abbreviated review of the highlights of the work in each area, concentrating on the key contributions made by Bill.
Pitch
Of the three primary qualities we perceive in sound, (pitch, loudness, and timbre), pitch is most closely tied to the properties of the sound source. Loudness varies with distance from the source, the driving force for vibration, and any obstacles that might block the sound’s path on the way to our ears. Timbre is affected by room acous- tics, how the source is supported, and how it is driven to vibrate. Pitch, however, is much less subject to these extraneous factors and depends more on the properties of the resonating source itself.
The long history of research on pitch shows that it corre- sponds largely to our perception of periodicity in sound. Many sounds in nature, particularly those having the most significance for us, are periodic, or at least roughly so. Speech and music are the most notable examples. These sounds have a harmonic structure whose peri- odicity is given by a fundamental frequency (F0) that with few exceptions dominates our perception of pitch. So strong is this tendency that we hear a pitch at F0 when there is little or no energy at F0; and even when the sound is inharmonic, we tend to hear a pitch corre- sponding to the closest match to F0 (see Yost, 2009, for a review and http://auditoryneuroscience.com/pitch for online demos).
Pitch contributes to sound source perception in a variety of ways. It tells of an animal’s size through their vocalizations, generally lower pitch vocalizations corresponding to larger size. Larger sized animals are more attractive to potential mates and are a greater threat to competitors. In humans, pitch affects the meaning of a spoken sentence through prosody and conveys information about the talker’s gender and even their emotional state. It also plays an important role in helping to segregate sound sources perceptually. The individual spectral components of multiple sources sounding simultaneously are conflated in a complex spec- trum reaching the ears. But a separate pitch is heard for each source, effectively segregating the sounds by their
harmonic structure. A popular demonstration of this is when a single component of an otherwise harmonic complex is slightly mistuned. The pitch of the mistuned component will “stand out” from that of the harmonic complex such that two pitches are heard simultaneously (Moore et al., 1986). The literature includes a variety of examples of segregation based on pitch (reviewed by Car- lyon and Gockel, 2008).
Bill’s work on pitch has focused on how it is encoded in the auditory system, the second part of the fundamen- tal goal of research on sound source perception. The question has prompted an ongoing dispute, dating back to Helmholtz (1863, 1954), between two theories: one centering on the features of the time waveform and the other on its spectrum. Because the spectrum is a transla- tion of the time waveform, early tests of the two theories based on acoustics alone proved difficult. Modern theory has since contributed what we have learned about the transformations of the signal taking place in the audi- tory periphery. We now know that individual fibers in the auditory nerve are selectively responsive to different frequencies, providing a place code for the sound spec- trum. Temporal features of sound are also represented in the group synchronous response of nerve fibers to signals.
The combination of these processes results is a neural activation pattern (NAP) in frequency and time that pre- serves much of the spectral and temporal information present in the airborne sound.
Figure 3, left, was derived from the NAP model of Patter- son et al. (1995). It shows the simulated neural response to a 200-Hz harmonic complex, which produces a strong perception of pitch at 200 Hz. Looking horizontally, one can readily see the oscillations, shifted in phase vertically, and having a periodicity of 5 ms, the reciprocal of 200 Hz. Looking vertically, one can also make out the representa- tion of the harmonic spectrum as neighboring activation maxima, with a spacing of 200 Hz. Figure 3, right, shows the simulated response to a special signal that Bill popu- larized and termed iterated rippled noise (IRN). IRN is created by passing random noise through a delay and add-back circuit and applying multiple iterations of the circuit (see Yost, 2009). There were three iterations of a delay of 5 ms for the signal in Figure 3.
IRN poses a challenge for models of pitch because it has no clear spectral or temporal structure but nonetheless
Spring 2022 • Acoustics Today 43