Page 20 - Volume 12, Issue 2 - Spring 2012
P. 20

  Fig. 2. Schematic diagram illustrating configurations for stimuli used to study spatial release from masking (SRM). The listener is facing front, with target speech (shaded speaker) in front. Panel A: the masker (white speaker) is also in front, hence the co-located condition. Panel B: masker is on the side, thus the monaural head shadow in the ear on the opposite side of the head is partially protected from the masker. Panel C: two maskers occur, one on each side, reducing or eliminating the head shadow.
Many studies to date have shown that the configuration in Fig. 2B can result in robust improvement in the percent cor- rect for word identification compared with Fig. 2A (Plomp and Mimpen, 1981; Hawley et al., 1999, 2004; Arbogast et al., 2002; Drullman and Bronkhorst, 2002; Litovsky, 2005). This phenomenon is often referred to as spatial release from mask- ing (SRM), because the interference, or masking that occurs in the presence of the masker(s) on the side (Fig. 2A) is reduced (released) when spatial cues are available. Studies on SRM in humans typically use speech materials such as words or sentences that are equalized across conditions for difficul- ty and frequency within the language, but across the various existing test materials, these variables can differ. SRM is typ- ically quantified in one of two ways. In one paradigm we measure the percent correct ([P(C)]) when the signal-to- noise ratio (SNR) for the target and maskers is set to various intensity levels, and [P(C)] is obtained for each condition at each SNR, and computed as [P(C)side-P(C)front]; positive val- ues would indicate improved performance. In a second para- digm we vary the SNR adaptively, and find the speech recep- tion threshold (SRT), defined as the SNR at which listeners reach a predefined criterion, such as 50% or 75% correct. SRM is then computed as: [SRTfront-SRTside]; positive values would indicate improved performance.
SRM tends to be largest when the speech and masker can be easily confused, and when listeners are unsure as to what aspects of the masker to ignore. Confusability can arise when the target/masker voices are similar; for example, consider a case in which the target speech and masker are both male voices with similar fundamental frequency (f0), vs. a case in which the target has an f0 of 125 Hz and the masker is a woman’s voice with f0 of 250 Hz. Confusability can also arise
when the target/masker have similar content, such as speech materials that can be inter-changeable or that carry similar meaning. These aforementioned examples elicit what has become known as “informational masking,” which is the default term used to describe masking that goes beyond “energetic masking,” or masking that is accounted for by processes in the peripheral auditory system (Durlach et al., 2003). Spatial separation of maskers from the target is an effective way to counteract informational masking (Kidd et al., 1998; Freyman et al., 1999, 2001). As a result, the magni- tude of SRM with informational maskers can be quite large relative energetic maskers (Durlach et al., 2003; Jones and Litovsky, 2008, 2011).
As mentioned above, acoustic cues can also affect the magnitude of SRM, and the effects can be divided into bin- aural and monaural components (Hawley et al., 1999, 2004; Jones and Litovsky, 2011; Bronkhorst, 2000; Loizou et al., 2009; Garadat et al., 2010). When target speech and masker are spatially separated, half of the binaural advantage comes from the “better ear effect” (also known as the “monaural head shadow effect”), where the SNR is increased in one ear due to attenuation of the noise from the listener’s head (Zurek, 2003). Another advantage, the binaural squelch effect, depends on the ability of the auditory system to utilize binaural aspects of the signal, including differences in the ITDs and ILDs of the target speech and the masker (Bronkhorst, 2000; Culling et al., 2004; Litovsky et al., 2012). A third effect is that of “binaural summation” whereby the activation of both ears renders a sound that is presented from a location in front easier to hear due to summation of the sig- nals at the two ears. Finally, for amplitude-modulated signals such as speech, ITD cues are also available from differences
Spatial Release from Masking 19





























































































   18   19   20   21   22