Page 39 - Volume 12, Issue 2 - Spring 2012
P. 39
from chinchillas to demonstrate that recovered envelopes do in fact occur in actual AN responses (Heinz and Swaminathan, 2009). We used correlogram based neural cross-correlation metrics that we developed to quantify the fidelity of TFS and ENV coding in AN responses to the same types of vocoded speech stimuli that have been used in the perceptual studies. Although these data provided physiological evidence that recovered ENV cues do occur at the output of the cochlea, these data alone did not allow us to evaluate the perceptual rel- evance of these recovered ENV cues directly.
Jayaganesh Swaminathan addressed this issue directly in his PhD dissertation through a coordinated neural modeling and perceptual study, which allowed him to evaluate quantita- tively the relative perceptual salience of neural TFS and ENV cues for speech perception in noise (Swaminathan, 2010). His psycho-physiological approach involved (1) the measurement of consonant identification in normal-hearing listeners, and (2) the prediction of neural TFS and ENV coding based on a physiological auditory-nerve model (Swaminathan and Heinz, 2012). By comparing the effects of signal-to-noise ratio on the measured perception and predicted neural coding using the same set of vocoder speech stimuli, he was able to quantify the relative contributions of neural ENV and TFS to the percep- tion of noise-degraded speech. A range of five different vocoder types were used to represent 16 consonants spoken by four speakers in the presence of speech-shaped background noise. This range of vocoders provided a range of stimulus conditions for which true TFS, true ENV, and recovered ENV cues were present (Fig. 3). The computational AN model used (Fig. 4) has been validated against neurophysiological single- unit responses to stimuli ranging from simple tones to broad- band noise to speech stimuli (Zilany and Bruce, 2006; 2007), and has been used in a number of applications related to SNHL (Heinz, 2010).
Regression models were used to predict consonant identi- fication based on the neural coding of TFS and ENV (quanti- fied using our neural cross-correlation coefficients). Separate models were used for positive and negative SNRs to evaluate the commonly held hypothesis that the relative salience of ENV and TFS differs in quiet and noisy conditions. The significance of the individual model terms, along with the TFS x ENV inter- action term, allowed us to evaluate the perceptual salience of the different types of neural coding. Overall, our psycho-phys- iological analyses suggested that TFS cues play a less important role than has been suggested from acoustical and psychoa- coustical studies. Psychoacoustic analyses alone (present and previous) suggest that speech perception in noise is primarily supported by TFS cues. In contrast, relating neural coding to measured speech identification demonstrated that (1) neural ENV is a primary cue for speech perception, even in degraded listening conditions, and (2) neural TFS does contribute in degraded listening conditions (less by itself and more through an interaction with ENV), but rarely as the primary cue. Differences in conclusions between psycho-acoustical and psy- cho-physiological analyses are likely due to cochlear signal pro- cessing that transforms TFS and ENV coding (e.g., recovered envelopes, Fig. 3) in normal-hearing ears. Interestingly, our computational modeling has also predicted that recovered
envelope coding is degraded with OHC damage (Heinz and Swaminathan, 2009), and thus these cochlear transformations are likely to be different in impaired ears.
Translational significance for hearing aids and cochlear implants
In summary, although it is always difficult to relate physio- logical and perceptual effects in a quantitative manner, this arti- cle illustrates several approaches that we have been taking to address the currently active debate regarding the role of TFS cues in perception, particularly for listeners with SNHL. By evaluating the physiological bases for perceptual effects, the translational implications of the observed perceptual TFS deficits can be better understood. Although the recent percep- tual studies suggesting a reduced ability to use TFS cues follow- ing SNHL could be taken to suggest AN-fiber phase locking is reduced following SNHL, our results suggest that the funda- mental ability of AN fibers to encode TFS is not degraded fol- lowing SNHL for either simple or complex sounds. Thus, a straightforward interpretation from perceptual studies that hearing aids simply need to overcome degraded TFS coding strength in AN fibers would appear to be misguided. However, other “TFS coding” deficits were observed in impaired AN responses (e.g., degraded TFS quantity in noise, as well as degraded TFS quality in terms of loss of tonotopicity, degraded spatio-temporal (across-CF) coding, and reduction of recov- ered ENV cues). Each of these effects provides alternative inter- pretations of the perceptual TFS deficits, and provide insights into the limitations of current hearing aids as well as potential strategies for improving hearing aids.
Our findings on the relative perceptual importance of ENV and TFS cues for speech perception in noise also have important potential implications for the development of improved strategies for cochlear implant (CI) signal pro- cessing. These implications can be understood by consider- ing each of the regression-model terms (ENV, TFS, and ENV x TFS) with respect to current CI technology. The finding that neural ENV was a primary contributor to speech perception in quiet and in noise is actually quite promising because CIs are currently able to provide ENV directly to the AN. However, an important distinction is that CIs currently provide acoustic ENV, rather than neural ENV. The finding that neural TFS alone was rarely the pri- mary cue for speech perception, even in noise is also encouraging since CIs are currently unable to provide neu- ral TFS. In fact, our results suggest that if neural TFS alone could be provided, speech perception in steady noise would likely be no better than a CI providing neural ENV alone. The last finding, that the interaction term (ENV x TFS) was also significant for speech perception in noise, implies that when TFS does contribute it does so primarily in the pres- ence of neural ENV cues. This suggests that if TFS were able to be provided in future technology, an important design constraint must be that TFS be provided in a way that does not disrupt neural ENV coding.
Our hope is that this work as a whole will contribute to the long-term goal of improving the daily lives of people with hearing loss through the application of physiological
38 Acoustics Today, April 2012