Page 36 - 2017Spring
P. 36

Speech Intelligibility Predictors
    Figure 8. Spectrogram (top) and neurograms (middle and bottom) for the sentence “How do we define it?” presented to a normal-hear- ing AN model at 65 dB SPL in the presence of background white Gaussian noise, also at 65 dB SPL. Plotting conventions are the same as in Figure 4.
Figure 10. MR and FT Neurogram SIMilarity (NSIM) metric pre- dictions for the sentence “How do we define it?” as a function of the signal-to-noise ratio (SNR) for background white Gaussian noise (left) and as a function of AN fiber survival (right).
Lessons Learned (and To Be Learned) from Physiologically Based Predictors
In general, physiologically based metrics have performed as well as or better than the traditional acoustic-based metrics in quantitative predictions of perceptual data (e.g., Bondy et al., 2004; Jørgensen and Dau, 2011; Christiansen et al., 2010; Bruce et al., 2013). This indicates that adding physiological detail is not problematic for cases where the acoustic-based metrics perform reasonably well and that the neural-based metrics can indeed overcome some of the shortcomings of the acoustic predictors. However, there is a range of different metrics that have been developed, as reviewed in this article, but only limited head-to-head comparisons have been per- formed between the physiological predictors (e.g., Bruce et al., 2013; Chabot-Leclerc et al., 2014). An important future area of research is to conduct more rigorous comparisons of the different predictor for multiple sets of speech intelligibil- ity data to determine which approach is best in general.
In addition, studies using these predictors have given insight into the neural coding of speech features. In general, the MR representation of speech envelope cues (i.e., slower temporal modulations) appears to be the dominant neural representa- tion in most situations, but spike-timing cues may also con- tribute additional information in adverse conditions such as low SNRs (e.g., Swaminathan and Heinz, 2012; Bruce et al., 2013). These conclusions have important consequences for how hearing aids and cochlear implants encode the acoustic features of speech, and physiologically based intelligibility predictors should be important tools in the improvement of such devices for the hearing impaired (Sachs et al., 2002).
 Figure 9. Spectrogram (top) and neurograms (middle and bottom) for the sentence “How do we define it?” at 65 dB SPL presented to an AN model with only 30% neural survival. Plotting conventions are the same as in Figure 4.
when the MR cues have been totally lost. Similarly, the FT NSIM does not drop as rapidly with decreasing neural sur- vival as does the MR NSIM, suggesting that the spike-timing representation may be more robust to loss of AN fibers.
 34 | Acoustics Today | Spring 2017

   34   35   36   37   38