Page 30 - Summer 2021
P. 30

LANGUAGE ENDANGERMENT
African countries. Imagine developing a simple voice- recognition app that allows the speaker of an endangered language to call another person by using voice commands such as “Call X” in the native tongue. Imagine also writing a basic software program for a talking dictionary based on the first 1,000 words in that language. Language revitaliza- tion lessons can be designed following the model in the Duolingo app. The app gives the speaker the chance to learn new words and use them in sentences of graded complexity. Technologizing endangered languages can create enthusi- asm among their speakers and bring about revitalization along the lines envisioned in Postulate 6.
Methodological Challenges
Technologizing endangered languages is easier said than done. It involves multilayered expertise, including familiarity with speech synthesis. Kent and Read (2000) describe several speech synthetic models. The ones that are currently in vogue rely mostly on the diphone concat- enation method. Diphone extraction consists of splicing every sound in two and measuring each half. This is done for every sound that occurs in a language. Unfortunately, this method is more amenable for use in languages that have been well studied phonetically and phonologically. However, for the hundreds of indigenous languages that have not yet been graphacized, using the diaphone method is extremely time consuming. For example, if we accept Clements’ (2000, pp. 125, 134) typology that a prototypi- cal African language has 21 consonants and 9 vowels, one would need to extract 900 diphones (30 × 30) for the data- bank. If 7 correlates are extracted for each diphone, one would need to extract and digitalize approximately 6,300 diphones.Ifthisapproachweretobeused,ǃXóo᷈ wouldgo extinct before a successful speech synthesis is achieved.
A Simpler Speech Synthesis Model
The diphone concatenation method is too onerous. For this reason,theOccamrazorprincipleofscientificinquirycom- pelsustolookforasimplermodelthatcanachievethesame result or better with relatively less effort. One alternative method consists of extracting formant data from syllables.
A syllable-based speech synthesis is appealing for at least six compelling reasons. First, the syllable has a long his- tory in human linguistic experiences. Second, both literate and preliterate societies take syllables into account in their songs and in various language games. Third, the syllable is a key building block in learning to read. Fourth, astounding insights have accumulated over the past 40 years that make
syllable-based speech synthesis theoretically sound. Fifth, the vast majority of world languages have relatively simple syllable structures. Last but not least, syllable-based synthe- sis is relatively less time- onsuming because the number of possible syllables is far fewer than the number of diphones. According to Clements (2000), the preferred syllable struc- ture of African languages is CV. This means that there are 189 possible CV syllables (21 consonants × 9 vowels). This number grows to 533 when canonical syllables such as VC, NV, CV1. and V2 are taken into account. A syllable-based synthesis requires 3,731 tokens instead of the 6,300 needed for speech synthesis based on diphone concatenation.
Conclusion
Itfollowsfromtheanalyticalsketchesoutlinedabovethat documenting endangered languages in accordance with Postulate 6 is beyond the expertise of a single linguist. Consequently, cross-disciplinary collaboration with people in other fields should be the hallmark of language documentation during IDIL 22-32. Naturally, because linguists have expertise in graphicization, phonetic tran- scription, Arpabet transcription, fieldwork, and acoustic phonetic feature extraction, they should lead documenta- tion efforts. However, their efforts should be augmented by expertise in engineering (signal processing), coding, computer science, and intelligent systems design. The addition of speech synthesis to the tools that documen- tary linguists are already using will prove to be extremely beneficial for endangered languages during IDIL 22-32 and beyond. If speakers can use their native tongues to access various technological applications, we believe as Crystal (2000) does, that Postulate 6 will increase the chances of survival of critically endangered languages.
Acknowledgments
I am grateful to Gary Simons, one of the editors of Ethno- logue, for making the endangered language map available to me and for giving me the Expanded Graded Inter- generational Disruption Scale (EGIDS) data from the Americas, the Pacific, and Asia. Many thanks to Arthur Popper for insightful critiques that have improved the readability of this paper.
References
Baken, R. J., and Orlikoff, R. F. (2000). Clinical Measurement of Speech and Voice, 2nd ed. Singular Publishing Group, San Diego, CA.
Batibo, H. (2009). Poverty as a crucial factor in language mainte- nance and language death: Case studies from Africa. In W. Harbert, S. McConnell-Ginet, A. Miller, and J. Whitman. (Eds.). Language and Poverty, Multilingual Matters, Buffalo, NY, pp. 23-36.
30 Acoustics Today • Summer 2021



















































































   28   29   30   31   32