Page 42 - 2018Fall
P. 42
Imaging the Listening Brain
Figure 4. Beginner’s guide to the listening brain. Shown is a lateral view of the left hemisphere of the cerebral cortex, with the medial portion hidden behind; the front of the head is to the left. a: Different lobes in the human cerebral cortex with approximate functional areas and the “what”/“where” pathways. b: Different regions of the cingulo-opercular and frontoparietal systems (dark gray, valleys/ sulci; light gray, crests/gyri). FEF, frontal eye field; TPJ, temporoparietal junc- tion; aPFC, anterior prefrontal cortex; aI/fO, anterior insular/frontal operculum; dACC, dorsal anterior cingulate cortex; dlPFC/MFG, dorsolateral prefrontal cor- tex/middle frontal gyrus; IPS/IPL, intraparietal sulcus/inferior parietal lobule; PCS/IFS, precentral sulcus/inferior frontal sulcus.
Speech Reconstruction Based
on Neural Signals
ECoG studies have also found stronger repre- sentations of the attended speaker in a multital- ker environment (Mesgarani and Chang, 2012). Leveraging the high spatial and temporal reso- lution available in ECoG, experimenters used these fine-grained neural signals to decode the neural spectrogram, such as reconstructing the attended speech at high temporal and spectral fidelity from the original acoustics of the speech signals. Furthermore, the success of these speech reconstructions correlates with the listener’s be- havior; reconstruction was successful only in trials when the subjects correctly reported the target words (and not during error trials) and the reconstructed neural spectrogram better re- flects the portion of the speech stream when the subjects only attended to one speaker compared
acoustic mixture was then presented identically to each ear without any spatial cues available. Listeners were asked to attend to only one of the two speakers.
Similar to previous findings based on tone pips, the neural representation of the attended speech stream was stronger than the ignored stream. Specifically, the neural response is more phase-locked to the envelope of the attended speech stream in the presence of a competing speech stream. Al- though neural sources were also detected 50 ms after stimu- lus onset, the difference in attended versus unattended neu- ral signals primarily arose 100 ms poststimulus. The earlier sources did not differ between the attended and the unat- tended streams, in agreement with other previous EEGs. Localization analysis of the earlier sources suggests that the neural source for these earlier components originates from Heschl’s gyrus (Figure 4b), the structure containing the hu- man primary auditory cortex, whereas the neural sources for the later component originate from the planum temporale, the cortical area just posterior to Heschl’s gyrus. A possible interpretation is that the entire auditory scene is processed by the primary auditory cortex and it is only weakly sensi- tive to selective attention, whereas the higher order auditory areas in the planum temporale receive the processed neural signals, with the speech streams already segregated. At this level of processing, perhaps more neural sources are devot- ed to processing the attended than the unattended speech stream, leading to a stronger difference in neural signal (Si- mon, 2017).
with the earlier period of the task when subjects had to listen for the target call sign from both speakers.
The stimulus-reconstruction method employed in both M/ EEG and ECoG studies differs in the frequency content of the neural signals used to recreate the attended speech signals. In ECoG studies, the low-frequency fluctuation (<8 Hz) of the high-frequency gamma signals (>70 Hz) is often used to ana- lyze the attended and unattended speech streams. In M/EEG studies, both attended and unattended speech representations can be seen at the low-frequency representations (<8 Hz), but the signal-to-noise ratio in the high-frequency gamma range is too low to detect meaningful changes. Besides the difference in usable frequency content separating these two recording techniques, there is another, perhaps more impor- tant, differentiation between ECoG and M/EEG approaches: the former is invasive and the latter is noninvasive. Despite recording further away from the neural sources in M/EEG, the stimulus-reconstruction method can be effective enough to classify which of two speakers the listener was attending to in single trials (with speech segments of about 60 s; O’Sullivan et al., 2014). This technological development is an important advance toward designing a hearing aid that can follow the user’s attention to selectively amplify the sound of interest, the holy grail of a futuristic hearing aid design (Lee et al., 2013a).
Modulation Beyond the Auditory Cortex
The previous section discussed how the auditory sensory ar- eas seem to faithfully follow the attended speech signal when
40 | Acoustics Today | Fall 2017