Page 48 - Winter2021
P. 48

How Machine Learning Contributes to Solve Acoustical Problems1
Marie A. Roch, Peter Gerstoft, Bozena Kostek, and Zoi-Heleni Michalopoulou
   What Is Machine Learning?1
Machine learning is the process of learning functional relationships between measured signals (called percepts in the artificial intelligence literature) and some output of interest. In some cases, we wish to learn very specific relationships from acoustics. Examples with direct com- mercial applications include selecting or recognizing music (Schedl et al., 2014) and identifying the language of a speaker (e.g., Zissman, 1996) for call center routing.
Alternatively, we may be interested in an exploratory analysis such as discovering relationships between animal- produced sounds and potential call categories that may carry signaling information (e.g., Sainburg et al., 2020). Machine learning can be used to discover information about the physical world such as determining the distance to a source based on pressure levels in a vertical line array (Niu et al., 2017) or solving inversion problems to find geoacoustic parameters of a seabed (Benson et al., 2000).
This article provides a high-level introduction to machine learning with a limited number of techniques that are explained conceptually. Most of our examples will use the vowel data of Peterson and Barney (1952). They showed that vowels could be relatively well identified by formant frequencies, harmonics of voiced speech that are ampli- fied by resonances in the vocal tract. These data were selected because they provide an example of a real acous- tics problem that can be solved in a low-dimensional space suitable for two-dimensional figures.
For readers desiring a more quantitative introduction to machine learning, we recommend the review by Bianco
1 For additional information on machine learning in acoustics, see the special issue of The Journal of the Acoustical Society of America at
et al. (2019) that focuses on machine learning and acous- tics or one of the many excellent book-length treatments of machine learning (e.g. Bishop, 2006; Hastie et al., 2009; Goodfellow et al., 2016).
Types of Machine Learning
Machine learning can be broadly separated into the major categories of supervised and unsupervised learning (Russell and Norvig, 2021). Other forms of machine learning exist but have not been used as extensively in acoustics, such as reinforcement learning (e.g., Shah et al., 2021; Wang et al., 2018) and so are beyond the scope of this article.
In supervised learning, the machine learning algorithm, or learner, is presented with examples of what is to be learned and labels that consist of values or categories for each example. An example of this is seen in the work of Godino-Llorente and Gomez-Vilda (2004) where the goal was to learn to detect specific pathologies of the vocal folds from recordings of vowels.
In contrast, unsupervised learning attempts to learn from examples that do not have labels. Xi et al. (2004) trained probability models for individual musical recordings. Similarity between pairs of songs was measured by seeing how well each song’s model scored the other. Clustering these scores separated songs by genre without the algo- rithm ever knowing the type of music.
Regardless of the type of machine learning, all algo- rithms require transformation of the input data into features, a representation of the input signal that is conducive to solving the machine learning problem. Traditionally, these features are selected by experts using knowledge about the problem domain. For example, Peterson and Barney (1952) recognized that
©2021 Acoustical Society of America. All rights reserved.
  48 Acoustics Today • Winter 2021 | Volume 17, issue 4

   46   47   48   49   50