Page 22 - Winter 2011
P. 22
all the taps. When the delay between each tap corresponds to the period of a particular frequency the nerve pulses at the output will sum to a high value, implying that the modulation in firing rate from that filter will be a maximum. When the tap period does not correspond to a multiple of the input fre- quency the output is minimal.
There are enough comb filters in each group to sort incoming modulations by their pitch into separate neural paths, one path for each pitch. To achieve the pitch accuracy of a musician, the group requires only about a hundred dif- ferent comb filters, each with a total delay of about 100 mil- liseconds. Brief signals produce useful pitches in a fraction of that time. Figure 1 shows the flow of information through the system, and a possible neural implementation of a comb fil- ter based on the speed of pulses traveling through fine diam- eter nerve fibers. The diagram in Fig. 1 shows the same num- ber of taps for each pitch, and a variable total delay length. Our computer model uses a constant total delay for all pitch- es, and varies the number of taps. Which system (if any) is actually used is not predicted by our data—but the average length of the total delay must be about 100ms to match our abilities to perceive music.
In Fig. 1, sounds entering the ear are separated into fre- quency bands by a bank of overlapping mechanical filters with relatively low selectivity. At the vocal formant frequen- cies each filter typically contains three or more harmonics of speech or musical fundamentals. These harmonics interfere with each other to create a strongly amplitude modulated sig- nal, as can be seen in the figure. The modulations in the sig- nal are detected linearly by the hair cells, but like an AM radio with automatic gain control the nerve firing rate for time variations longer than about 20 milliseconds is approx- imately logarithmically proportional to the sound pressure. The brain stem separates these modulations by pitch using a number of comb filters each ~100ms long. Two of these fil- ters (detecting two different pitches) are shown in the figure, but about one hundred are needed for each basilar mem- brane band. Once separated by pitch the brain stem com- pares the amplitude of the modulations for each pitch across the basilar filter bands to determine the timbre of the source, and compares the amplitude and timing of the modulations at each pitch between the two ears to determine sound direc- tion. Using these cues the brain stem assembles events into separate foreground sound streams, one for each source. Sound left over after the foreground is extracted is assigned to a background sound stream. Reflections and reverberation randomize the phases of the harmonics. When the reflections are too strong the modulations in each frequency band become noise-like, and although pitch is still detectable, tim- bre and direction are not.
Stream formation
The comb filters separate sound events by pitch relative- ly easily, and can do it in the presence of high levels of rever- beration. But to create separate sound streams for each source the brain stem must determine to which sound source the various pitch events belong. The task is easy if the timbre and azimuth of each pitch event can be identified, and this is possible when the acoustics are sufficiently clear. By compar-
ing the strength of the modulations at a specific pitch across the formant bands the timbre of a particular event can be determined, and by comparing the strength and timing of each pitch event between the two ears the localization can also be determined. Using these cues the brain stem can assemble events into meaningful foreground streams, and present the streams to higher levels of the brain.
In this case the brain is capable of a further separation. Sound elements identified by their pitch, localization, and timbre can be separated from the reverberation they induce. We get a distinct perception of two different types of sonic streams—the foreground streams of notes and syllables, and a single combined background stream that includes all the reverberation. The background stream has interesting prop- erties. When the foreground is strong the notes and syllables mask the reverberation, but we perceive the reverberation as continuing unbroken through the foreground sound events. When the reverberation is stronger than the foreground ele- ments, the foreground elements are perceived with the tim- bre and azimuth that is detected at their onsets—even if the reverberation soon overwhelms them. In both cases if the background stream is at least partially coming from all direc- tions it is perceived as surrounding the listener.
When the azimuth and timbre of the direct sound is masked by reflections and reverberation, the brain is forced to consider both the note and its reverberation as one sound event. The combination becomes one sonic object. The rever- beration is bound to the note, and is perceived as primarily in front of the listener, regardless of the actual spatial distribu- tion of the reverberation. When the foreground—the direct sound—is clearly perceived, the reverberation can be sepa- rated from the note. Then for most people the reverberation is perceived as louder and more enveloping.
But there is another aspect of stream formation. When the brain is able to accurately separate notes or syllables by pitch we perceive the instruments or speakers as being close to us. These sounds demand more attention than sounds per- ceived as muddy and far away. This kind of clarity is an essen- tial part of drama and cinema. Drama and cinema directors demand that theaters be acoustically dry, with directional loudspeakers for dialog. They want the maximum dramatic effect to be conveyed to the audience. The author firmly believes the same kind of clarity is needed in musical per- formances. Opera especially needs clarity, whether you understand the language or not. Clear sound draws a listener into the emotional experience of the scene. Well blended sound encourages a passive kind of listening. The goal in the- aters is to make the direct sound stronger than the total reverberation; to make the direct to reverberant ratio (D/R) greater than unity. But some concert halls and opera houses demonstrate that dramatic clarity can be achieved at lower values of D/R.
Implications for musical acoustics
The physical model and the observations above need not be precisely accurate to be useful for room acoustics. The physics on which they are based predicts reasons why some halls deliver startling clarity over a wide range of seats—and why many of their copies do not. First, the model explains the
18 Acoustics Today, January 2011