Page 27 - Volume 12, Issue 2 - Spring 2012
P. 27

 TONAL LANGUAGE PROCESSING
Fan-Gang Zeng
Departments of Anatomy and Neurobiology, Biomedical Engineering, and Cognitive Sciences University of California, Irvine
Irvine, California 92697
 Atonal language uses changes in today by Spanish-speaking villagers
tone or pitch of a voiced sound to
differentiate words. A classic
example is the consonant-vowel combi-
nation /ma/ in Mandarin Chinese. The
same /ma/, depending upon the tonal
pattern of vowel /a/, can mean mother
(妈, flat pattern), numb (麻, rising),
horse (马, falling-rising), or curse (骂,
falling). Growing up in the United States, my 9-year-old boy still confuses mother with horse, “cursing” his weekly 2-hour Chinese School as a form of “child abuse.” Who should we blame for inventing tonal languages? What’s good in them? Why is it hard for our brains? Or is it really?
According to the late linguist, Yuen-Ren Chao, tones have been used to differentiate words in Chinese for at least 3,000 years. Recently researchers from the University of Edinburgh found that people who speak tonal languages also carry the least disturbed form of a 37,000-year-old gene, Microcephalin, suggesting that the first language was tonal (Dediu and Ladd, 2007). Indeed, ancient Greek (9-6th cen- turies BC) used tonal accents, but its tonality got lost, perhaps as a result of variations in that gene. Today, about 70% of the world’s languages are tonal languages, which are spoken by over 2-billion people, mostly in sub-Saharan Africa and South East Asia (Haviland et al., 2007). So, our ancestors out of Africa invented tonal languages, but why?
One answer may lie in the acoustics and perception of tones. Dr. Zhi-An Liang at the Shanghai Institute of Physiology published a classical paper (Liang, 1963) to show that compared with consonant and vowel perception, tone perception is the most redundant in terms of resiliency to acoustic distortions. Although tones are defined by variations in fundamental frequency, they can still be accurately per- ceived after removing the fundamental frequency via high- pass filtering or whispered speech. One can literally abuse the acoustic signal by filtering, infinite clipping, or adding noise, but still achieve a high level of tone perception. The reason for this high resistance to distortions and noise is that the acoustical cues for tone perception are multi-dimensional and widely distributed in both time and frequency domains. Tonal information is correlated with duration and temporal envelope in the time domain (Whalen and Xu, 1992; Fu et al., 1998). But the more salient cues for tone perception are in the temporal fine structure, fundamental frequency, and their harmonics (Xu and Pfingst, 2003; Kong and Zeng, 2006). Possibly for their acoustical redundancy and perceptual resiliency, tones were invented to enable long distance com- munication in noisy backgrounds. Well, they are still used
26 Acoustics Today, April 2012
“Why are 70% of the world’s languages tonal and more than 2 billion people speaking them?”
who can whistle Silbo in the Canary Islands (Meyer, 2008) as well as tonal- language-speaking customers in a noisy Chinese restaurant (Lee, 2007; Luo et al., 2009).
How do our ears and brain work together to process tonal information? Our ears are essentially filter banks that
decompose sounds into different frequency regions. The fil- ter bandwidth is narrow and relatively constant for center frequencies less than 2,000 Hz, but increases linearly for cen- ter frequencies above 2,000 Hz. In cases of a voiced sound, the fundamental frequency and its lower harmonics are like- ly separated into different filters, whereas the higher har- monics are likely combined into one filter. Tonal information is extracted from the output of these auditory filters.
There are at least three types of cues for pitch extraction. First, the fundamental frequency itself conveys a salient pitch percept by producing a strong timing cue that occurs in the right place or apical part of the cochlea. Second, the lower har- monics can also produce a salient pitch percept by generating a distinctive temporal and spatial pattern along the cochlea, a well-known phenomenon called the missing fundamental. Third, the unresolved high harmonics can produce a strong timing cue that is phase-locked to the fundamental frequency, but in the wrong place or basal part of the cochlea. Functionally, this envelope-based timing cue cannot provide a salient pitch percept (Zeng, 2002; Oxenham et al., 2004).
Recent physiological studies have shed light on the brain’s representation of pitch and its usage in tonal language processing. In marmoset monkeys, researchers found that neurons in a restricted low-frequency cortical region respond to both pure tones and their missing fundamental harmonic counterpart (Bendor and Wang, 2005). This corti- cal region has been mapped to Heschl’s Gyrus in humans. Interestingly, in a study teaching English-speaking subjects to learn Mandarin tones, Wong and colleagues (2008) found that subjects, who were less successful in learning, showed a smaller Heschl’s Gyrus volume on the left, but not on the right hemisphere, relative to learners who were successful. This finding leads to a general question on hemisphere spe- cialization of tone perception: Which hemisphere do we use to process lexical tonal information?
Hemisphere specialization has been known for a long time in that the left hemisphere is for speech whereas the right hemisphere is for music processing. Tones are repre- sented by changes in pitch—a salient music quality, but they also carry lexical meaning—a salient speech feature.











































































   25   26   27   28   29