Page 21 - Volume 12, Issue 2 - Spring 2012
P. 21

in the timing of the envelopes (slowly varying amplitude) of the stimuli.
The binaural cues that are thought to be important for segregation of speech and noise can be studied selectively over headphones by imposing either similar binaural cues on the speech and masker, the co-located condition, or by vary- ing the binaural cues, such that the target and masker are per- ceived to be at different intracranial (inside the head) loca- tions, the separated condition. For speech separation, the binaural intelligibility level difference (BILD), the difference in speech intelligibility threshold between the co-located and separated conditions, can be as large as 12 dB in adults, depending on the condition (Blauert, 1997; Hawley et al., 2004; Litovsky et al., 2012). A simpler version of the BILD is the binaural masking level difference (BMLD), where a target signal such as a tone or narrow-band noise is detected in the presence of a masking noise. BMLD can be measured, for example, by comparing threshold for tone detection when: both the noise and tone are in-phase at the two ears—the N0S0 condition—and to that when the noise is in-phase at the two ears, but the tone signal is out-of-phase at the two ears— the N0Sπ condition. Presumably, the tone and noise are per- ceived as co-located intracranially in the N0S0 condition, while they are perceived as spatially separated in the N S .
from 8 to 30 dB, depending on the specific condition.
The BMLD, headphone-based paradigm, in which ITD is manipulated to produce source segregation, has been instructive in thinking about the benefit that listeners get in spatially separated conditions in free field. Unmasking occurs in these paradigms, because in the separated condi- tion the acoustic characteristics of the signals in two ears are highly dissimilar (Gabriel and Colburn, 1981; Bernstein and Trahiotis, 1992; Culling and Summerfield, 1995). Thus, the task becomes one in which listeners detect “incoher- ence” between the separated and co-located conditions. Note that these conditions, in which one cue is varied (e.g., ITD) do not provide listeners with all the cues available in a realistic listening situation. Because of the interest in under- standing SRM under conditions that mimic the real world, many studies have implemented the testing paradigm illus- trated in Fig. 2, where monaural and binaural cues are mixed and both contribute to SRM (Hawley et al., 1999, 2004; Bronkhorst, 2000; Culling et al., 2004; Litovsky et al.,
2012).
Another area of growing interest regarding unmasking
of speech is that of non-sensory processes involved in source segregation. These could potentially include cogni- tion, attention, memory, emotion and other similarly “top- down” processes. One model for considering these process- es is that of “object formation” (Shinn-Cunningham, 2008), whereby there is an attempt to explain how attention influ- ences perceptual abilities. It has been suggested that atten- tional mechanisms, which are invoked in a “cocktail party” situation to segregate speech from maskers, share aspects of the neural mechanisms controlling attention in the visual field. In addition, the role of visual cues in directing audito- ry attention turns out to be important in segregating speech
from maskers and enhancing SRM (Best et al., 2007; Varghese et al., 2012).
Spatial release from masking in children
Thus far, the discussion has focused on mechanisms by which the auditory system of adult listeners teases apart co- occurring sounds and facilitates speech understanding in noisy environments. A number of studies by Litovsky and colleagues have simulated aspects of the auditory environ- ment that might be encountered in a “classroom party effect.” Litovsky (2005) first demonstrated SRM in children aged 4-7 years with target and maskers and compared with those found in adults. The testing paradigm is different from that which is typically implemented with adults, since young chil- dren have a more limited vocabulary and ability to provide a reliable response on the task. Thus, a novel method for test- ing children was devised. Children engaged in a listening “game” with a four-alternative forced-choice (4AFC) task, whereby children pointed to pictures matching the heard words. Prior to testing, children are familiarized with target spondaic words, selected such that they could each be repre- sented with a visual icon (e.g., ice-cream, cow-boy, bird- nest), and that were within the vocabulary of 4-year-old chil- dren. Maskers consist of sentences strung together that do not overlap in content with the target speech. In this study, SRM was computed from SRTs measured in the co-located and separated conditions, and averaged 5.2 dB and 7.4 dB, in conditions with one or two maskers, respectively. Thus, chil- dren were able to benefit from differences in spatial cues between target speech and masker, with larger effects if two- talker maskers were used. It is worth noting that in this study the target-masker configurations resulted in SRM due to a combination of binaural and monaural cues. More recently we (Misurelli and Litovsky, 2012) found that children aged 4- 7 years also demonstrate SRM when head shadow cues are minimal (see Fig 2C). In the right-left condition, the maskers are displaced towards both sides of the head, thus resulting in a condition with minimal or absent “better ear.” SRM is com- puted either by comparing percent correct [P(C)right-left- P(C)front] or thresholds [SRTfront-SRTright-left]. It is noteworthy that right-left SRM was smaller than side SRM, when both head shadow and binaural cues were present. Similar find- ings have been reported in adults as well (Marrone et al., 2008; Jones and Litovsky, 2011.
SRM can be found in children as young as 3 years of age (Garadat and Litovsky, 2007), again using age-appropriate speech and computerized listening games. In this case the speech corpus was chosen to be within the receptive language and vocabulary of children at ages 2.5 to 3.0 years. As in Litovsky (2005), the child selected a visual icon to match the heard word on each trial. By age 3 years children had SRM values that were similar to those of 4-5 year olds, suggesting that the ability to benefit from spatial separation between tar- get and maskers developed at this young age. Furthermore, children who demonstrated the greatest SRM were those with high speech reception thresholds (SRTs) in the front condition where the target and maskers were co-located. In the Litovsky (2005) study SRM was shown to be larger when
0 π. The difference in threshold between N0Sπ and N0S0 ranges
20 Acoustics Today, April 2012






















































































   19   20   21   22   23