WaveTransformer: An Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information
Automated audio captioning (AAC) is a novel task, where a method takes as an input an audio sample and outputs a textual description (i.e. a caption) of its contents.