Page 48 - Summer 2006
P. 48

 Echoes from Providence
Continued from page 45
  typed (repeated and recognizable) pulsed calls, which are thought to be learned within the pod (living group). Repertoires of these stereotyped calls are pod specific, and the pitch contours of shared stereotyped calls are also group-specific from matrilineal lines (group with same mother) to larger pods (consisting of several matrilineal lines) to clans (even larger groups sharing calls). One of the remarkable features of killer whale pulsed calls is that they contain two overlapping but independently modu- lated contours or “voices.” These are shown superposed on the spectrum as in Fig. 2. Bi-phonation, as this is called, is common in birds but has been described for few other marine mammal sounds. One of the challenges of analyzing these complex sounds is to “pitch-track” these two components from the same sound as shown in the example.
For the most part, the sounds produced by killer whales
have been classified into groups called “call types” by
humans from listening to the calls and observing their spec-
1
tra. This human classification by eye and ear is quite con-
sistent, and has been useful to reveal group-specific acoustic repertoires and matching vocal exchanges. It would, nonetheless, be useful to replace human classification with an automatic technique because of the large amounts of data to be classified, and the fact that automatic methods can be fully replicated in subsequent studies.
In our studies we examined two sets of sounds previ- ously classified into call types by human listeners. The first set was recorded from captive killer whales in Marineland in the French Antilles, and the second set from northern resident whales recorded on the open sea.
Dynamic time warping (DTW) and dissimilarity of pitch contours
The sounds that were classified into each call type have a similar shape or contour within that group although
Fig. 3. Pitch contours of two examples from call type n32. The shorter contour is from the sound with spectrum in Fig. 2.
Fig. 4. Cost matrix with minimum cost path in bold red through the center. The shorter sound is called the query and the longer sound the target.
the lengths of the calls will differ. For the automatic classi- fication, a technique for quantitatively comparing curves of similar shape but different length is required. Dynamic time warping was widely used in the early days of speech recognition and more recently in musical information retrieval, and it is ideal for this task. The basic idea of DTW can be explained with an example using two sounds from the same group “n32.”
A difference matrix is constructed from each number of sound 1 subtracted from each number of sound 2. This will give low values where the curves have similar values. From these numbers a cost matrix is constructed, which can be loosely thought of as a running sum of the differ- ences between the two curves for all possible paths. The minimum path will follow the low numbers measuring overall differences in the best match of the two curves; this path can be traced and the final distance or dissimilarity is the last number attained in the minimum path. This can be visualized in Fig. 4 as the path of minimum effort through a mountainous terrain.
The “dissimilarity” or distance thus obtained is an excellent measure of contour differences. Identical signals will have a diagonal best path and a cost of zero (zero dis- similarity), while larger contour differences will have a correspondingly larger cost/dissimilarity. For classification these costs are a means of clustering the calls having the smallest dissimilarities.
Classification
The computer classification based on minimum dis- similarity within groups was compared to the human clas- sification into call types for each of the two sets of whale sounds. For the Marineland calls using these distances and then running through the calculation a second time using the derivative of each of the contours (measuring the shape rather than the absolute value), an outstanding 99 % agree- ment with the human grouping was obtained. For the
  46 Acoustics Today, July 2006















































































   46   47   48   49   50