Audio and Speech Technologies Workshop

Organised by MARVEL and AI HUB Tampere Projects.

MARVEL and AI Hub Tampere organize the “Audio and Speech Technology Workshop” on June 16, 2022, at Paidia workspace in Nokia Arena, Tampere, Finland. In this full-day workshop, attendees will be able to learn more about the applications of audio and speech technology and the AI methodologies related to these fields, discuss data questions, experience company solutions, see demos and try some hands-on tutorials.

Info Pack

Audio and Speech Technology Workshop


Thursday, June 16 2022 at 09:00 EEST


Online and on site at Nokia Arena, Tampere, Finland.

Register Now!

Location map

For more information please visit the event website here

The Workshop is supported by the EU H2020 project Multimodal Extreme Scale Data Analytics for Smart Cities Environments (MARVEL) under GA No 957337.

More Information about this Workshop

By whom?

This workshop is a joint effort by AI Hub Tampere and MARVEL projects.

For whom?

The workshop is open to representatives of companies that are interested in applications of audio and speech technology, and AI methods related to the fields


The development of machine learning in combination with signal processing techniques has led to new possibilities in applications that process and analyse audio or speech. This workshop aims to provide an opportunity to exchange information about the latest development in these fields. It will consist of a set of talks given by globally leading experts studying these topics in Tampere University, as well as selected case study presentations from companies. The morning session will focus in giving an overview of different applications that can be addressed by audio and speech technologies, followed by discussing data and machine learning techniques that are needed or can be used in the development of the methods. The afternoon session will include case study presentations from companies and selected projects, as well as a hands-on tutorial that explains how one can get started working on these topics using available resources.

For more information please visit the event website here

The Workshop is supported by the EU H2020 project Multimodal Extreme Scale Data Analytics for Smart Cities Environments (MARVEL) under GA No 957337.

Audio and Speech Technology Workshop

Morning Session
09:00Welcome (Tuomas Virtanen)
09:10Overview of applications of audio and speech content analysis and processing (Annamaria Mesaros, Okko Räsänen, Tuomas Virtanen, Archontis Politis)
10:25Coffee break
10:40Audio and speech data for AI applications (Tuomas Virtanen, Okko Räsänen)
11:10Machine learning methods for audio and speech AI (Annamaria Mesaros)
12:00 - 13:00Lunch break
Afternoon Session
13:00Case studies from companies and projects: speech processing, industrial applications, music processing, smart cities, speech with robots
- WordDive
- Meluta
- Yousician
- Speech for human-robot collaboration, Tampere University
- MARVEL, audio for smart cities
15:00Coffee break + demonstrations
15:30Hands-on programming tutorial (Toni Heittola)
16:15Closing (Tuomas Virtanen)

Workshop Speakers

Prof. Tuomas Virtanen

Professor at Tampere University.

Tuomas Virtanen is Professor at Tampere University. He received his doctoral degree from the Tampere University of Technology in 2006. He is known for his pioneering work on single-channel sound source separation using nonnegative matrix factorization and computational analysis of sounds in everyday environments. He has authored over 200 scientific publications on these topics and has received the IEEE Signal Processing Society 2012 best paper award. He is an IEEE Fellow, and recipient of the ERC 2014 Starting Grant “Computational Analysis of Everyday Soundscapes”.

Dr. Annamaria Mesaros

Assistant Professor, Tampere University

Annamaria Mesaros is Assistant Professor at Tampere University. She received her PhD in Signal Processing at Tampere University of Technology in 2012. Her research focuses on sound event detection in real-world multisource environments, including semantic aspects of human-generated sound annotation, and includes over 35 scientific publications and many open datasets. She is coordinator of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge. She is currently an Academy of Finland Research Fellow for “Teaching Machines to Listen”, and is member of the Audio and Acoustic Signal Processing Technical Committee of IEEE Signal Processing Society.

Okko Räsänen

Associate Professor, Tampere University

Okko Räsänen is an Associate Professor at Tampere University, Finland. He received his M.Sc. degree in language technology from the Helsinki University of Technology (2007), D.Sc. (Tech.) degree in language technology from Aalto University (2013), and he worked as a visiting researcher at Stanford University (2015). He is also a Docent and visiting researcher at the Department of Signal Processing and Acoustics at Aalto University. His research on the study of child language development using computational models, but he also works on many other topics in speech technology, machine learning, and cognitive science.

Archontis Politis

Post-doctoral researcher, Tampere University

Archontis Politis is a post-doctoral researcher at Tampere University, Finland. He obtained his M.Eng. degree in civil engineering from Aristotle University, Thessaloniki, Greece, and his M.Sc. degree in Sound & Vibration studies from the Institute of Sound and Vibration Research (ISVR), Southampton University, UK, in 2006 and 2008, respectively. In 2016 he obtained a Doctor of Science degree on parametric spatial sound recording and reproduction from Aalto University, Finland. His research interests include spatial audio technologies, virtual acoustics, array signal processing, and acoustic scene analysis.

Toni Heittola

Post-doctoral researcher, Tampere University

Toni Heittola is a post-doctoral researcher at Tampere University, Finland. He received his M.Sc. degree in Information Technology from Tampere University of Technology (TUT), Finland, in 2004. In 2021 he obtained a Doctor of Science degree in computational audio content analysis in everyday environments from Tampere University, Finland. His main research interests are sound event detection in real-life environments, sound scene classification, and audio content analysis.



Key Facts

  • Project Coordinator: Dr. Sotiris Ioannidis
  • Institution: Foundation for Research and Technology Hellas (FORTH)
  • E-mail: marvel-info{at} 
  • Start: 01.01.2021
  • Duration: 36 months
  • Participating Organisations: 17
  • Number of countries: 12

Get Connected



This project has received funding from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No 957337. The website reflects only the view of the author(s) and the Commission is not responsible for any use that may be made of the information it contains.