Page 14 - Fall 2007
P. 14

 Pisoni, 2004). According to this account, impoverished expe- rience with the faces of speakers diminishes the ability of children with ASD to detect structure in visible speech infor- mation, leading to a reduction in AV integration. Consistent with this hypothesis, in the context of AV mismatch, per- ceivers with ASD have been shown to use visual speech infor- mation less than auditory information (Massaro and Bosseler, 2003; Williams et al., 2004). Regardless of the theo- ry, each makes the same general prediction, of reduced AV integration in children with ASD.
Recent research on audiovisual speech processing in ASD
My current program of research, conducted at Haskins Laboratories, is designed to refine these existing theories and provide data that will allow for more fine-grained accounts of audiovisual speech integration in this population. The prin- cipal goal of this program of research is to examine sensitiv- ity to visual speech information in children with ASD when they are fixated on the face of a speaker, which has not been done in previous studies of audiovisual speech perception in the population of individuals with autism spectrum disor- ders. By employing visual tracking methodology, we can evaluate the degree to which children with ASD integrate audiovisual speech when fixated on the face of a speaker, as compared to typically developing controls. This application of visual tracking methodology allows us to adjudicate between two possible underlying causes of atypical audiovi- sual integration of speech in ASD: that affected children show reduced audiovisual integration because of gaze aver- sion to the face of a speaker and that children with ASD have an underlying weakness in integration of AV speech.
Converging evidence about integration of AV speech is currently being obtained by examining ASD and controlling perceivers’ sensitivity to three types of AV speech processing: audiovisual integration, detection of audiovisual asynchrony (which, in typical perceivers, is related to audiovisual inte- gration) and perception of audiovisual speech in the context of auditory noise (which, in typical perceivers leads to increased reliance on the visual speech information). Typical perceivers are influenced by visual speech information even when the auditory signal is unambiguous (the McGurk effect). Furthermore, for typical perceivers, the speaking face assists in recognition of auditory speech in noise (Sumby and Pollack, 1954). By examining the influence of visual speech information when the auditory signal is degraded, such as in the context of auditory noise we can assess whether, when pressed, the affected children’s perceptual processing of AV speech can parallel typical processing.
At this point, preliminary data are available from this project for mismatched audiovisual (McGurk) speech stim- uli. These pilot data support the hypothesis that children with ASD show less AV integration than typically developing (TD) children even when fixated on the face of a speaker. Two verbally fluent boys with autism (mean age 9.25 years, age range 9-9.5 years) were compared to 3 TD children (2 girls, 1 boy, mean age 9.3 years, range 7.5-10.5 years) on degree of visual influence of seen speech on heard speech. All
 of the children were native speakers of American English, and were reported by their parents to have normal hearing and normal or normal-to-corrected vision (one TD child wore corrective lenses during the testing procedure). The children with autism had received a clinical diagnosis of autism, and met criteria for autism on the Autism Diagnostic Observation Schedule (Lord et al., 1996), an instrument for directly assessing in an individual’s behaviors associated with autism and on the Autism Diagnostic Interview-revised (Lord et al., 1994), a semi-structured interview for caregivers of children and adults for whom autism or pervasive devel- opmental disorders is a possible diagnosis.
Eye gaze data were collected by superimposing a cursor on an image from a remote mounted scene camera that shows the participant’s field of view, thus the system is able to measure gaze (Applied Science Laboratories, 2004). To opti- mize the accuracy of the pupil coordinates obtained by the optical camera, a magnetic head tracking unit in the form of a small sensor was attached to the head of the participant (an 8 millimeter sensor was attached to a slim wire placed on the head with a headband, see Figure 1).
The visual stimuli were presented on a computer monitor in front of the participant. The auditory speech stimuli were presented from a centrally located computer speaker placed directly below the monitor. A videotaped record of the partic- ipant was taken to allow for coding of verbal responses. A male, native speaker of English was videotaped producing the consonant-vowel syllables (CV) /ma/, /na/, and /ga/. These videotaped syllables were digitally edited with Adobe PremiereTM software to create the audio and video stimuli. Audio tokens were either /ma/ or /na/. Video tokens were either matching, cross-spliced tokens of /ma/ or /na/ (i.e., a dif- ferent token of auditory /ma/ + visual /ma/) or mismatched AV tokens consisting of auditory /ma/ + visual /ga/that leads to a percept of /na/ when visual influence occurs.
Each subject was placed in a chair 25 inches from a com- puter monitor. The headband with the magnetic head track- ing sensor was placed on each participant’s head (see Fig. 1). The participant’s pupil coordinates were calibrated with the eye-tracker system by asking the participant to look at col-
 12 Acoustics Today, October 2007
Fig. 1. A participant views a speaker using an eye tracker.
























































































   12   13   14   15   16