Navigation auf uzh.ch
Our neuroethological research is to reveal clues about the meanings of vocalizations in their broader behavioral context and to infer the strategies used by animals to modify their immature vocalizations towards a sensory target provided by the parents, akin to infants learning to speak. In the past, we have learned much from analyzing animal vocalizations in controlled environments such as single birds housed in isolation (1-3). However, the isolated animal setting is overly impoverished in comparison to the natural setting in which animals grow up in the midst of social partners. The scientific impact of such research therefore is intrinsically limited because data from such a setting cannot account for the social influences to vocal learning that arise in the natural setting of a colony.
The information contained within animal vocalizations can only be understood when relevant multimodal information is considered. Namely, the environment and general behavior of the vocalizing animal and its social partners are of particular importance, both while vocalizations are produced as well as immediately before and thereafter. For example, the directed songs of a male produced towards a female are different and subserved by different brain mechanisms than the undirected songs produced alone (4). And, the song learning success in juveniles seems to depend on ongoing social interactions with non-singing adult females (5). For such reasons, we acquire extensive high-quality audio and video data of vocalizing freely-behaving animals in complex social settings (cf. Computational Ethology) and want to segment and categorize behavioral actions of individuals and social interactions in these recordings.
The annotation of such massive datasets is, however, particularly challenging. Manual annotation is often not feasible as it is too labor intensive. Therefore, we plan to adopt and develop machine learning methods for automated action recognition from video and audio recordings, as well as from wireless sensor nodes that we mount on birds with the main purpose of selectively recording vocalizations of the sensor-wearing bird.
There is a rapidly growing number of methods for action recognition from video and audio recordings, and from recordings using animal-borne sensors (6-11). Many methods are based on posture tracking (6,7,11) to circumvent direct training of classifiers on high-dimensional video data, which can be overwhelming. Action recognition has been facilitated by the recent introduction of deep-learning-based animal posture tracking tools (12-16).
Action recognition from video recordings requires good visibility of the animal in the camera image. In cases where good visibility of an individual on a camera image is not given, a successful approach may benefit from multi-modal action recognition, where the video data is combined with microphone or accelerometer recordings from the wireless sensor nodes.
We are especially interested in courtship behaviors due both to their importance for sexual selection, as well as for the extensive attention from researchers that one particular vocal courtship signal, male birdsong, has received. For songbirds, selecting a partner for copulation and subsequent offspring rearing involves complex courtship displays that include varied vocalizations and coordinated body movements.
If you are interested in this project for an MSc Thesis or Semester Project, please get in touch with Linus Rüttimann