Page 21 - Winter 2011
P. 21
These binaural recordings illuminate an aspect of hall acoustics that acoustic research has largely ignored. In stan- dard texts, clarity has been loosely defined by the intelligibil- ity of single voices, or the ability to hear the pitches of instru- ments. For many seats in modern concert halls and opera houses the clarity that enables a trained listener to identify and localize the instruments in an orchestra or a string quar- tet, the clarity that pulls the full attention of a listener into the composition, the clarity that nearly every commercial sound recording delivers, is lost.
Localization and timbre
Timbre of an instrument—and the difference between spoken vowels—is determined by the strength of harmonics in the vocal formant frequency range, roughly 700Hz to 4000Hz. The basilar membrane filters in the inner ear sepa- rate these frequencies into about 15 overlapping bands. The differences in the strength of the signal in these bands allow us to identify the word or the instrument. Likewise, differ- ences in the strength and timing of the signals between the two ears allow us to determine the sound direction. But if several instruments are playing at once typically two or more harmonics from each source occupy the same basilar mem- brane filter. The basilar membrane is not selective enough to separate them. If we look at the average signal in each filter band we will get a mixture of timbres—and have little clue to the source directions.
Separation of sound sources by pitch
A critical issue for music and speech perception is that instruments playing together, or several people talking at once, all produce harmonics in the same vocal formant range. If we are to detect the location and timbre of each instrument or the vowels of simultaneous speech we must first separate the harmonics from each source into independ- ent neural streams. It is clear that the brain stem can do this, and the ability is vital to human hearing. The ability to sepa- rate harmonics enables us to listen to several conversations at once and switch our attention between them at will. The
cocktail party effect is known to depend critically on pitch. A person speaking in a monotone can be separated from anoth- er if the difference in pitch is only half a semi-tone, a fre- quency difference of only three percent. If the pitches are identical—or if the speakers whisper—the two voices cannot be separated. We believe that the necessity of performing the cocktail party effect has driven the evolution of our extraor- dinary sensitivity to pitch—and of our appreciation of musi- cal scales and harmony.
The properties of music can be used to understand the physics of this process. A trained musician can tune an instrument to an accuracy of one part in a thousand. The average music lover can perceive pitches to at least 1%. The basilar membrane is incapable of such precision. Furthermore, our ability to perceive pitch is circular in octaves. If we double the frequency of a complex tone, the pitch—in a musical sense—remains the same. It is sometimes difficult to decide in which octave a complex tone originates, particularly in the presence of other pitches.
The author has developed a physical model that explains these abilities. Physics tells us that harmonics carry in their phase the memory of the pulse that created them. If several adjacent harmonics of the same tone are present at the out- put of a filter, once in each fundamental period the harmon- ics align in phase, adding together to make a strong peak in the output of the filter. As the harmonics drift apart the peak goes down. The result is a strong amplitude modulation of the filter output. When several harmonic tones are present at the same time each creates modulations specific to their fun- damental frequency and these modulations sum linearly. In this model the basilar membrane is not only sensitive to the average amplitude in a band, but it also detects amplitude modulations in that band—much like an AM radio.
In our model the detected modulations from each band pass to a group of neural structures that resemble comb fil- ters—a pitch sensitive filter that is both highly efficient of neurons and circular in pitch. A comb filter can be under- stood as a delay line with a large number of taps, each sepa- rated by a constant delay. The output consists of the sum of
Fig. 1. Flow of information through the model.
Clarity, Cocktails, and Concerts 17