Tuomas Virtanen Archives

Enriched Music Representations With Multiple Cross-Modal Contrastive Learning

Post published:November 19, 2021
Post category:Journal / Magazine Publications

Modeling various aspects that make a music piece unique is a challenging task, requiring the combination of multiple sources of information. Deep learning is commonly used to obtain representations using various sources of information, such as the audio, interactions between users and songs, or associated genre metadata.

WaveTransformer: An Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information

Post published:November 18, 2021
Post category:Publications

Automated audio captioning (AAC) is a novel task, where a method takes as an input an audio sample and outputs a textual description (i.e. a caption) of its contents.