Data sets

MARVEL edge devices will be deployed in predefined areas of Trento and Malta municipalities, where huge amounts of streaming audio-visual data will be recorded and processed according to MARVEL’s specifications. MARVEL’s tools and algorithms will ensure that all recordings will be executed according to European legislation and in compliance with all relevant privacy and ethical regulations.

MARVEL project will continuously augment the dataset with processed data by adopting a concrete incremental scheme. To provide an added value to the European scientific community and industry, MARVEL dataset will be public, obtained free of charge and released as a service (Data Corpus-as-a-Service)

The list below provides an overview of indicative publicly available datasets from the scientific and industrial community to be used on top of the ones provided by the Municipality of Trento and the Maltese authorities

Name and linkContentStructured/AnnotatedSizeRight of use
FreesoundHuge collaborative database of audio snippets, recordings, bleeps.Yes / Yes360k+ sounds / 213 days / 2.8TBCreative commons licences (CC0, CC-BY, CC-BY-NC)
SHPD: Surveillance Human Pose DatasetasetThe SHPD dataset was collected for the human pose research works on surveillance tasks. About 25000 images, which are all taken from on-using surveillance cameras, are included.Y/YAprox. 27GBNo commercial reproduction, distribution, display or performance rights in this work are provided
SONYC-USTA dataset for the development and evaluation of machine listening systems for realistic urban noise monitoring.Yes/Yes1.9 GBCreative Commons Attribution 4.0 International
SPID: Surveillance Pedestrian Image Datasetconsists of 14550 training images and 15439 test images, comprising a total of 29989 original images and 110069 labeled pedestriansYes / YesAprox. 10GBNo commercial reproduction, distribution, display or performance rights in this work are provided.
UCF-Crime DatasetasetA new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomaliesYes / Yes95.9 GBFree for research
SoundNetA dataset to train large-scale sound recognition models.No / No2.1M videos with sound / 600MBFree for research purposes
ESC-50: DatasetsetA labelled collection of 2000 environmental audio recordingYes / Yes2000 recordings / 600MBCreative commons (CC-BY)
DCASE2019The synthetic set is composed of 10 sec audio clips generated with Scaper. Yes/Yes1.2 GBCreative Commons Attribution Non Commercial 4.0 International
TAU Urban Acoustic Scenes 2018 Mobile, Development dataset dataset10-seconds audio segments from 10 acoustic scenes. The dataset contains in total 28 hours of audio.Yes/Yes11.3 GBOther (Non-Commercial)
SHAD: Surveillance Human Action Dataseton DatasetSHAD dataset was collected for the human action research works using surveillance videos (300 video clips, taken from surveillance camerasYes/Yes7.8 GBNo commercial reproduction, distribution, display or performance rights in this work are provided.

Key Facts

  • Project Coordinator: Dr. Sotiris Ioannidis
  • Institution: Foundation for Research and Technology Hellas (FORTH)
  • E-mail: marvel-info{at} 
  • Start: 01.01.2021
  • Duration: 36 months
  • Participating Organisations: 17
  • Number of countries: 12

Get Connected



This project has received funding from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No 957337. The website reflects only the view of the author(s) and the Commission is not responsible for any use that may be made of the information it contains.