Page 41 - 2018Fall
P. 41

Analyzing the Auditory Scene
in the Brain
How does the brain group acoustic energy across frequen- cies and process them as a unifying sound object through time, such as the grouping harmonics of a voice? One theory, called the temporal coherence model (Shamma et al., 2011), posits that the brain solves this scene analysis problem by collating information across neural populations encoding various sound features, such as pitch and spatial location, that are temporally correlated with each other. Intuitively, this theory makes sense: components from a sound source tend to be coherent with each other. For example, consider a baby crying and being soothed by his/her father’s voice. Specifically, the harmonics of the father’s voice will be coher- ently modulated by the same vocal tract. Coincidentally, the spatial cues associated with these harmonics will also be co- herently modulated because they all came from the same lo- cation. However, these harmonics will be uncorrelated with the crying baby because the spectral energy came from two different independent sources.
In the laboratory, experimenters use a random figure-ground stimulus to study this hypothesis (Teki et al., 2011). As a vi- sual analogy, psychologists refer to how you group words on this page as “figure” and the white space as “ground.” In this auditory figure-ground experiment, the number of tones co- herently modulated in frequency is systematically changed and also presented with other random tones. As the num- ber of coherently modulated tones increase, the subjects are more likely to detect this moving figure amid the random ground. A recent study recorded EEGs while the subjects were exposed to these figure-ground stimuli (O’Sullivan et al., 2015). When subjects were watching a silent movie and are passively exposed to these auditory stimuli, these inves- tigators found that the brain tracks this temporal coherence with an onset as early as about 115 ms. This is remarkably fast and suggests that the brain begins to group sounds early in the auditory processing hierarchy. Furthermore, this pro- cessing can exist without explicit attention. When subjects are actively engaged in listening to these auditory stimuli (viz., press a button when they hear a specific trained pat- tern), the neural representation of this temporal coherence is even more pronounced. This suggests that active listening and selective attention can further enhance this temporal coherence processing. The use of source-modeling tech- niques to analyze these temporal coherence responses has shown that the neural sources originate from the bilateral temporal areas in both passive and active conditions, where-
as previous fMRI research has shown activation bilaterally in the parietal region in response to varying coherence levels (Teki et al., 2011).
Attention Modulation in the
Auditory Cortex
Many studies have asked whether auditory-related brain re- gions show evidence of “tuning in” to a sound of interest in selective attention tasks. Back in the 1970s, a seminal EEG study (Hillyard et al., 1973) tested whether the attentional state of the listener would modulate the N100 response, a negative deflection in the electric potential of auditory sen- sory areas that occurs around 100 ms after the onset of an au- ditory stimulus. Listeners were presented with two streams of tone pips, one ear with a lower frequency relative to that of the other ear, and they were tasked to pick out when a rare “oddball” tone was introduced into the attended stream. Using only one electrode recorded on top of the head at the vertex, the researchers found that the strength of the N100 response varied depending on which stream listeners were instructed to attend. Due to the properties of electromag- netism, every EEG component has an MEG counterpart. The magnetic counterpart of the N100 response, called the M100, was also found to be modulated by attention. Fur- thermore, using the source localization technique, this attention-modulated M100 component was localized to the auditory cortex (Woldorff et al., 1993). Furthermore, a modulatory effect was found in components preceding the M100, suggesting that the auditory response might be mod- ulated even before the initial stages of sound processing in the cortex, most likely originating from other cortical atten- tional control centers, a topic that we discuss later. Subse- quent fMRI studies using an array of behavioral paradigms also showed attentional modulation of BOLD activity in the auditory sensory areas and that this attention modulation is frequency specific. Like a radio, it seems that the human auditory cortex can tune into a preferred frequency channel on demand (Da Costa et al., 2013).
The aforementioned studies focused on the simple labora- tory situation of selectively attending to one of two auditory streams of tones. In recent years, more advanced signal-pro- cessing and neural-modeling techniques allow experiment- ers to examine listening situations that are akin to our origi- nal “cocktail party problem.” In one MEG experiment (Ding and Simon, 2012), listeners were presented with two lengthy speech streams from different talkers (either the same or different gender) mixed into a single acoustic channel. This
Fall 2017 | Acoustics Today | 39

























































































   39   40   41   42   43