Page 59 - Summer2020
P. 59
Figure 5. Jalapa de Díaz Mazatec words with waveforms (top) and spectrograms (bottom) illustrating different voice qualities A: modal voicing [thæ], “itch” (word 21). B: creaky voicing [thæ̰ ], “sorcery” (word 20). F1, F2, and F3, first three vocal tract resonances or formants. Available at bit.ly/2TxrBdK from files bit.ly/2IjYG7H and bit.ly/32NE4OC produced by Speaker 4. Word numbers and speaker number reference the items in the original recordings.
sound that is filtered by the resonant characteristics of the entire vocal tract above it. Because the source and filter are assumed to be independent, it could also be referred to as an independent source and filter model. In Figure 4A, the major speech articulators, which can be divided into static and dynamic, are illustrated using a standard midsagittal view of the head. The tongue is a dynamic articulator, and speech is produced as a result of the interaction between the tongue with the static articulators, creating different tube configurations. Figure 4B illustrates the major static articulators in the approximate location they would fall on in a tube model of the vocal tract. For a neutral tube, the resonant fre- quencies can be calculated by assuming that the vocal tract is a closed-open tube and applying a one-quarter- length standing wave resonator to estimate the resonant frequencies of the tube. The tube model can be used to make predictions about the effect of different types of articulation and how they will impact the acoustic characteristics of the speech.
The Sounds of Language
As detailed by Ladefoged and Maddieson (1996), individ- ual speech sounds, which are often referred to as segments or less accurately as phonemes, can be broken down into classes based on their production mechanisms and acous- tic characteristics. The first main division is consonants and vowels. Vowels generally have a voicing source at the larynx (the structure that houses the vocal folds; Figure 4A), where egressive airflow (air flowing out) from the lungs sets the vocal folds in motion, and their spectral characteristics and resulting resonant, or formant, char- acteristics are determined by different vocal tract shapes above the larynx. The first three vocal tract resonances or formants (Figure 5A, yellow bands), together with the overall spectral shape of the signal, are the foundation for human perception of vowel quality (Hillenbrand et al., 2006). Vowels can also contrast in other ways. One way is in terms of duration, where they can vary in terms of long versus short vowels. Another way is whether the velopharyngeal port (the place where the velum and the pharyngeal wall meet; Figure 4A) is closed (with only oral resonances) or open (with additional nasal resonances due to airflow into the nasal cavity). Yet another way vowels contrast is whether they have a single main vowel quality (monophthongs) or vowel movement between two (diph- thongs) or three (triphthongs) vowel qualities.
Voice Quality
An important dimension of voiced segments are the ways in which speakers can manipulate vocal-fold vibrations creating distinct voice characteristics, referred to as voice quality or phonation type. Voice quality is typically clas- sified into three types: modal, breathy, and creaky. Modal voicing is characterized by regular cycles and by a fairly linear drop in energy of about 6 dB/octave. Breathy voic- ing, in which the vocal folds are slack and very loosely held together, has a lower amplitude than modal voicing, and it is typified by an additional aperiodic component and a steep falloff in spectral energy. Creaky voicing, in which the vocal folds are slightly stiffer and tightly closed at the anterior end while allowing the posterior end to vibrate, has a lower amplitude than modal voicing, and it
Summer 2020 • Acoustics Today 59