Page 30 - 2017Spring
P. 30
Ian C. Bruce
Postal:
Department of Electrical and Computer Engineering McMaster University 1280 Main Street West Room ITB-A213 Hamilton, Ontario L8S 4K1 Canada
Email:
ibruce@ieee.org
Physiologically Based Predictors of Speech Intelligibility
Speech intelligibility predictors are powerful tools for evaluating how a listening environment, signal processing, and hearing impairment affect speech communication.
Introduction
Just a century ago, researchers at AT&T’s Western Electric Research Labs (later renamed Bell Labs) began a comprehensive research program to develop an objec- tive predictor of speech intelligibility that would provide a tool for efficiently as- sessing different speech telecommunications systems (Allen, 1996). This work was largely directed by the first president of the Acoustical Society of America (ASA), Harvey Fletcher, who was later made an Honorary Fellow of the ASA and then awarded the Gold Medal from the Society. The metric they developed was termed the articulation index (AI), which fundamentally measured the level of speech that is received above any background noise in a set of independent frequency bands, that is, a signal-to noise ratio (SNR) or the threshold of audibility if there is no noise in a band (French and Steinberg, 1947; Fletcher and Galt, 1950). Some nonlinear properties of the human auditory periphery were included in an ad hoc fashion. The AI was developed based on an extensive set of perceptual experiments using nonsense syllables as well as the fundamental knowledge about human psy- choacoustics and the physiology of the ear at that time period.
Although the AI was developed to evaluate speech communication systems such as the analog telephone network, it was soon seen to be a valuable tool for the fields of speech and hearing research and audiology. However, a proliferation of different simplifications of the original AI was developed for these diverse purposes, in- cluding a 1969 American National Standards Institute (ANSI) standard (Hornsby, 2004), such that it was difficult to compare results across different studies. This prompted the development of the speech intelligibility index (SII) as a standard that captured the main principles of the AI but allowed for certain variations in the calculation method and application (ANSI, 1997), including allowing for deg- radations due to hearing loss, higher than normal sound presentation levels, and the upward spread of masking. The SII has also been extended to deal with cases of fluctuating background noise (Rhebergen et al., 2006).
A limitation of the AI and SII is that there are a number of distortions to speech, such as peak clipping or reverberation, that are known to affect speech intelligibil- ity but that cannot be formulated simply in terms of an SNR, and the original AI only considered certain distortions relevant to telephony. This led to the develop- ment of the speech transmission index (STI), which uses the AI framework but substitutes a measure of acoustic modulation transfer for the SNR (Steeneken and Houtgast, 1980). This is based on the premise that speech information is primarily conveyed by temporal modulations in the speech envelope in independent fre- quency bands, and any form of distortion that degrades those modulations will lead to a reduction in intelligibility. For example, both reverberation of a speech signal and background noise will tend to fill in the temporal dips of the directly
28 | Acoustics Today | Spring 2017 | volume 13, issue 1 ©2017 Acoustical Society of America. All rights reserved.