2017Spring

Page 33 - 2017Spring

P. 33

Figure 4. Framework for reference-based prediction of speech intelligibility. A “reference” AN response “neurogram”(r; middle top) is obtained by presenting a clean speech signal to a normal- hearing AN model (see Figure 5) at a conversational speech level. A degraded AN neurogram (d; middle bottom) is generated for the same speech signal that may have undergone some form of processing to degrade the acoustic signal and/or for an AN model
with some type of pathology. Neurogram-based metrics predict the intelligibility of the degraded speech by comparison of the r and d neurograms. Modulation-based metrics first pass the neu- rograms through a bank of modulation filters (see Figure 6) to generate corresponding R and D modulation representations for the reference and degraded speech, respectively, before computing an intelligibility prediction.
Figure 5. Example of an AN fiber model used in physiologically based intelligibility predictors, providing a computational imple- mentation of the transduction process described in Figure 3. The input to the model is an instantaneous pressure waveform of the acoustic stimulus impinging on the tympanic membrane, and the output is the set of spike times for a model AN fiber with a particu- lar characteristic frequency (CF) in response to that input. The vari- ables COHC and CIHC shown respectively within the OHC and IHC
the importance of speech modulations for intelligibility) and incorporate modulation filter banks (Elhilali et al., 2003; Zilany and Bruce, 2007; Jørgensen and Dau, 2011). As seen in Figure 4, right, the reference (r) and degraded (d) AN neurograms can be passed through such a modulation fil- ter bank to produce corresponding reference (R) and de- graded (D) modulation spectrum representations. Figure 6 illustrates the spectrotemporal receptive fields (STRFs) of a widely used filter bank that considers joint time-frequency modulations (Elhilali et al., 2003; Zilany and Bruce, 2007).
blocks of the model, are scaling coefficients with values between 0 and 1 to indicate OHC and IHC health, respectively, at that CF in the cochlea. The variables τC1 and τcp control the time-varying, nonlinear filtering in the signal and control paths, respectively. LP, low-pass filter; NL, static nonlinearity; INV, inverting nonlinearity; Σ, summation. Reprinted from Zilany and Bruce (2006) with per- mission from the Acoustical Society of America © 2006.
Other predictors consider one or more banks of filters that analyze temporal modulations within each frequency band (e.g., Jørgensen and Dau, 2011).
Note that in contrast to the widely used reference-based ap- proaches, it is also possible to obtain “reference-free” pre- dictions of speech based on how the output of an auditory model responding to a test speech signal varies in certain statistics from the average neural response to speech in gen- eral rather than a reference version of the specific speech sig- nal (e.g., Hossain et al., 2016).
Spring 2017 | Acoustics Today | 31

31 32 33 34 35