MARVEL Data Corpus

The Internet of Things (IoT) and 5G technology are rapidly leading the digital transformation of our daily activities. As cities are becoming smarter, these solutions are also marshaled in order to facilitate the related authorities and enhance public safety. Smart surveillance systems collect information in real-time and automatically inform the user concerning ongoing incidents (e.g., a car crash in a monitored road, drug dealing in a public square, etc.). Artificial Intelligence (AI) is applied for the materialization of these modern services, with the gathering and maintenance of Big Data becoming quite essential.

One of the key outcomes of the MARVEL project is the so-called Data Corpus, which aims to give the possibility to Small-Medium Enterprises (SMEs), start-ups, as well as the scientific and research communities to build on top of open datasets that will become available from MARVEL pilots (or other contributors). It will also increase the opportunities for creating new business by exploring extreme-scale multimodal analytics, evolving existing algorithms and more.

Today there is a lack of open datasets that can be used in order to improve the performance of the AI components. MARVEL and its Data Corpus are coming to resolve this issue. Our pilots are deploying smart sensing equipment to identify and respond to ongoing safety incidents. Then, a subset of these datasets is annotated and anonymized. The results are ingested in the Data Corpus and are provided to the community as open datasets. External entities could support and contribute to this effort as well.

In order to do that, we believe that the user interfaces (UI) and the user experience (UX) are of utmost importance in the designing process. In addition, when the underlying technology that the UI needs to serve is handling audio and video files, the ultimate goal is to enhance the UX through an engaging, user-friendly environment with sophisticated data technology.

In order for a user to handle data in the Data Corpus, a flexible and user-friendly interface has been deployed to facilitate these latter needs. For that purpose, MARVEL Data Corpus is powered up by state-of-the art technologies in storing and handling audio-video data. In the context of the MARVEL Data Corpus UI prototyping process, we envisioned several different stories a user would follow when interacting with it such as viewing, adding, and deleting the data. However, the user journey is one: to have the best UX and via a few clicks to be able to process the data of the Corpus.

The user journey starts with the main front page of the UI where the MARVEL Data Corpus user can have an overview of the current status of the uploaded datasets. Furthermore, the latter works as a starting point for the user if (s)he wants to add any new dataset, view, or alternate existing ones. The impressive feature of this journey is that the user can have, on a single page, a complete synopsis of the uploaded data inside the Corpus and the corresponding actions over them.

From this point, the user can add, edit, view, or delete the selected dataset with a simple click since the main page of the UI will redirect him/her to the corresponding page of the interface. When it comes to adding data to the Corpus, the interface will guide the user via a single page where a series of related fields must be filled.

Editing a dataset is as simple as it can be and can be done through a single page as well. Since the relative correlated information needed to be filled by the user is quite a lot, the UI of the Corpus, via a uniform view, gives the ability to have an overall control of his/her entries.

Last but not least, the MARVEL Data Corpus user can view and delete a specific dataset by just selecting it and performing the relevant action. Upon successful deletion, the dataset list presented on the front page of the UI will be automatically refreshed.

Another significant feature of the Data Corpus, which will be covered in a future blog entry, includes the incorporation of augmentation techniques. It is common in the Machine Learning (ML) field to augment the original datasets in order to enhance the categorization (or other) capabilities of the learnt algorithms. For example, if you have collected a video dataset from a public square during the morning, you can create an augmented version of this dataset by applying brightness filters, simulating the same use-cases in the afternoon. Therefore, the enhanced ML algorithm would perform better when facing a real video stream during the afternoon hours, than the original one that has been training solely with the initial raw data. The Corpus will support a series of augmentation techniques, both for video and audio files, trying to simulate different time points within the day or different weather conditions (e.g., add a video filter to simulate rain).

What is your first impression regarding the Data Corpus? Are there any features in particular that you would like to see? Feel free to reach out using the MARVEL contact form or to find and talk to us on Twitter and LinkedIn and share your thoughts with us!

Blog signed by: by the Sphynx team

Key Facts

  • Project Coordinator: Dr. Sotiris Ioannidis
  • Institution: Foundation for Research and Technology Hellas (FORTH)
  • E-mail: 
  • Start: 01.01.2021
  • Duration: 36 months
  • Participating Organisations: 17
  • Number of countries: 12

Get Connected



This project has received funding from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No 957337. The website reflects only the view of the author(s) and the Commission is not responsible for any use that may be made of the information it contains.