Machine learning approaches for sound separation and vocal segmentation

template — Illustration of segmentation challenges in multi-bird recordings. In this example, the spectrograms of the male's (top) and the female's (middle) accelerometer signals reveal frequent simultaneous vocalizations that would be challenging to separate from the microphone signal alone (bottom). In addition, type 1 (red arrow) and type 2 (blue arrow) radio noise constitute additional challenges for identification and segmentation of the vocalizations of both birds.

Our goal is to advance the understanding of cultural learning in songbirds. Over the past years, we have gathered extensive datasets from pair-housed birds engaged in courtship and mating behaviors and from young birds learning their songs by interaction with their parents. To make sense of these longitudinal recordings, we need to solve increasingly complex sound separation and vocal segmentation problems. Our goal is to identify for each animal the times at which it is vocalizing and to reliably distinguish these events from other types of sounds including the vocalizations from other animals.

Our birds are housed together in acoustically isolated chambers, their sounds are recorded with a single wall-mounted microphones and their movements are observed with high-resolution cameras from three orthogonal viewpoints. Each bird also wears an accelerometer that provides distinct signals of its body movements including low-frequency vibrations associated with vocalizations. These latter animal-borne signals are useful for separating the vocal behaviors of an individual in a social setting.

Accelerometer signals have properties that render them more difficult to analyze than microphone signals: they only monitor low-frequency vibrations, are affected by radio noise, and wing flaps mask vocal signals.

We have started to develop a semi-supervised machine learning system that can annotate a single channel using minimal expert-labelled data. To avoid annotation errors stemming from various sources of noise in a single channel, we want to take advantage of the information contained in the other channels. Our current approach is to extent the input and output space of our machine learning system and to learn all channel annotations simultaneously.

Student Project

If you are interested in this project for an MSc Thesis or Semester Project, please get in touch with Kanghwi Lee

Reference

Ter Maat, A., et al. (2014). Zebra finch mates use their forebrain song system in unlearned call communication. PloS one 9.10: e109334

Quicklinks

Main navigation

Machine learning approaches for sound separation and vocal segmentation

Student Project

Reference