Spring2020

Page 21 - Spring2020

P. 21

The process of constructing such virtual models has tradi- tionally been performed using one of two main approaches. One approach is to construct a physics-based model (often through computer simulation) of the processes that take place during sound generation by a musical instrument. For example, a simulation may seek to predict the vibrations of the parts of an instrument, including air fluctuations, as well as all of the physical processes leading to the radiation of the sound from the instrument. Some physics-based modeling has taken the approach of finding electrical circuit analogs to the mechanical system under study.
The other main sound synthesis approach has been to emu- late the sonic features of the instrument via signal-processing techniques. The synthesis of instrument sounds in this approach is done with the intent to capture the salient aspects of sounds produced by the instrument(s) under consider- ation, without regard for how they may have originated. A long-standing example of this is wavetable-based synthesis, in which the frequency features of an instrument are modu- lated in time by transients such as attack, decay, sustain, and release (ADSR).
In recent years, a significant new tool to accomplish the afore- mentioned synthesis of acoustic sounds is that of machine learning (ML). ML generally refers to the use of data analysis to discover patterns from large datasets such that a predic- tive model can be iteratively “trained” to produce outputs that increasingly approximate the desired results. Although many various forms of ML exist, some of the most power- ful ML methods have employed artificial neural networks, which can be regarded as a set of curve-fitting approximation methods that use a series of “layers” of matrix multiplica- tions with nonlinear functions applied between each matrix operation. When there are many layers, the model is called a “deep” neural network and its training is called “deep learn- ing” (which is regarded as a subset of ML). Deep-learning methods have become increasingly used in a wide variety of research fields, including astrophysics, genetics, engineering, and acoustics, for tasks such image labeling, automated data acquisition, and speech recognition, often rivaling or at times besting the state-of-the-art methods previously crafted care- fully by human experts. The use of deep learning for audio synthesis has been termed neural audio synthesis (NAS) and is discussed in Neural Audio Synthesis.
If one attends an ASA meeting, sessions in nearly every subdiscipline (e.g., bioacoustics, underwater acoustics)
will feature talks and posters applying either physics-based modeling or deep learning to the specific acoustics domain featured in the session. Thus, although in this paper we focus on providing an update on these two approaches in the spe- cific context of the synthesis of musical instrument sounds, such methods are generally applicable to other acoustics domains as well.
Physics-Based Modeling
The function of musical instruments has been studied by physicists, engineers, and scientists in general, not only due to the dominant role of the instruments in music produc- tion but also because of the compelling physical effects that govern the process of sound generation. An early attempt on physics-based modeling was presented by Kelly and Loch- baum (1962) focusing on speech synthesis by modeling the human vocal tract. Until the 1980s, various methods were developed that were suitable for musical instrument simula- tion, such as finite-difference models, mass-spring networks, and wave digital filters (reviewed in Välimäki et al., 2006).
Early digital sound synthesis methods were developed in the 1950s to generate audio/music in the absence of a musical instrument. These methods, such as FM synthesis, attempt to replicate sound spectra without being based on underlying
physical laws. They are still popular in the field of electro- acoustic composition, but they fail to offer realistic control over a digital musical instrument.
On the other hand, physics-based modeling aims to simu- late the sound generation mechanism of musical instruments. Thus, physics-based modeling offers a physics-based repro- duction of waveforms under both static and dynamic conditions, the possibility to model transient and nonlinear phenomena, and intuitive control over the involved physically meaningful parameters. Sounds generated by physics-based modeling can contain all the subtle audio information of sounds produced by a real instrument. Furthermore, physics- based modeling presents the possibility to estimate model parameters from naturally performed sounds. These param- eters can be associated with certain playing techniques and may therefore be used to reveal differences not only across
instruments but also across instrumentalists.
Detailed studies are now available on the function of musical instruments (e.g., Fletcher and Rossing, 1998). These studies also include analysis of complex nonlinear phenomena that attract a great deal of attention in musical acoustics, such as
Spring 2020 | Acoustics Today | 21

19 20 21 22 23