MARVEL proposes a comprehensive framework for extreme scale multi-modal AI-based analytics in smart cities environments, achieving multimodal perception and intelligence for audio-visual scene recognition, event detection and situational awareness. The framework utilizes innovative Big Data technologies along the complete data processing chain, including:

  • privacy aware multimodal AI tools and methods
  • multimodal audio-visual data capture and processing
  • co-design of edge and fog ML/DL models through federated learning
  • extreme-scale multimodal analytics for real-time decision making at all infrastructure levels (Edge, Fog, Cloud)
  • continuously optimized, resource adapted edge and fog ML/DL deployment
  • advanced visualization techniques including text-annotated audio-visual attention maps, enabled by multimodal analytics and content oriented processing.

The MARVEL framework realizes the need to go beyond traditional Big Data, cloud-only or edge-only architectures and adopts the edge-fog-cloud Computing Continuum paradigm to support dynamic and data-driven application workflows. All smart city environment data-collecting devices like cameras, audio and other sensors (Edge), as well as the devices involved in the communication infrastructure before reaching the cloud including controllers, switches, routers, and locally deployed servers (Fog), participate in the AI-based computing process jointly with the cloud resources to increase resilience and reliability and to reduce time-to-insight/decision and bandwidth usage.


With the MARVEL’s Edge-to-Fog-to-Cloud (E2F2C), Ubiquitous Computing distributed architecture, widely distributed resources and services, at different distances from the point of data capture, are programmatically aggregated on-demand to support the necessary data-driven application workflows. The MARVEL framework addresses the inherent property of machine learning models to have widely differing computational and memory requirements during the creation/configuration (‘training’) and operational (‘inference’) phases of their lifecycle, including a personalized federated learning scheme and an intelligent deep neural network distribution and deployment across the edge-fog-cloud. Strong privacy assurance mechanisms are in place, to ensure privacy preservation at critical data exchange points along the data path. In more detail, the MARVEL framework is based upon the following pillars:

  • Pillar I: Real heterogeneous distributed Big Data in smart cities environments: Availability, generation, security, privacy, ethics, and Data Corpus as a Service. MARVEL will, from the project start, make available real Big Data from smart cities environments across various heterogeneous data sources, including audio-visual, drone-generated data and other data sources. MARVEL implements privacy preservation techniques at all data modalities, at all levels of its architecture, and is GDPR-compliant. Finally, MARVEL delivers an extreme-scale corpus of processed multimodal audio-visual public data to foster the vision of the European Data Economy.

  • Pillar II: AI-based intelligence for multi-modal perception and situational awareness. MARVEL has the ambition to achieve multimodal perception and intelligence for audio-visual scene recognition, event detection and situational awareness, without or with minimized human intervention, via an AI-based method that attempts to mimic human perception to localize sound sources in visual scenes. This will be achieved by utilizing: 
    • Automatic cross-modal systems that aim to mimic processing and computational audio-visual tasks analogous to the human auditory system.
    • Time-sensitive and extreme-scale audio-visual analytic capabilities to extract timely, useful, and actionable information from the real world.
    • The adoption of a coordinated action that is based on a wider research theme linking fields such as audio and visual processing, machine learning, edge and fog computing and artificial intelligence.
  • Pillar III: Edge-to-fog-to-cloud (E2F2C) distributed ubiquitous computing architecture. MARVEL goes beyond the more traditional edge-only and cloud-only architecture designs via the edge-fog-cloud computing continuum paradigm for data collection, data and resource management, distribution, training, inference, visualization and decision making. In addition, MARVEL will improve upon the standard federated learning paradigm. While standard federated learning has advantages of improving data privacy when learning a global model, it also potentially loses accuracy due to the enforced averaging of the model across all involved entities. MARVEL adopts a novel approach where a combination of model averaging and personalized, entity-specific models are utilized, hence increasing the overall accuracy. Furthermore, MARVEL architecture introduces a novel module for deployment optimization and management, responsible for continuous maintenance and improvement of data capture and ML/DL deployment at all levels of MARVEL’s E2F2C architecture. Finally, it includes the MARVEL’s decision making toolkit to support decision making and offer advanced visualizations.

  • Pillar IV: Quantitative assessment of E2F2C and Multi-modal AI tools and methods via societal, academic and industry validated benchmarks. MARVEL will systematically, qualitatively and quantitatively assess the proposed approaches in Pillars I-III, through a thorough exploration and adoption of benchmarks. To this end, MARVEL will carry out a systematic study of the existing and currently ongoing benchmarking solutions and initiatives, including BigDataBench, the H2020 projects DataBench, and Hobbit, as well as the BDVA reference frameworks and DCASE Challenge.

The figure above presents an overview of MARVEL framework. First, the framework integrates streaming data that are processed at the edge and fog layers with predictive audio-visual models at the cloud (data-at-rest). Heterogeneous sources create a rich dataset which is ingested in the solution, whereby advanced data management and distribution is applied. The dataset is subsequently ingested into the AI-based federated learning framework and several models are trained (e.g.: feature extraction, multimodal representations, audio-visual scene classification, audio-visual event detection). The resulting learned patterns empower the MARVEL’s streaming analytics modules, the GPU-accelerated engine, incremental/sequential federated learning and content-based processing. 

The analytics on the ingested streaming data (data-in-motion) are integrated – where necessary – with historic information (data-at-rest), to identify business-logic patterns that have happened or are about to happen. The processing of historical data and streaming data are fed to MARVEL’s advanced visualization tools, where long-term to short-term business decisions are taken. 

MARVEL’s architecture is compatible with the Big Data and edge-fog computing reference models proposed by BDVA, NIST and ISO JTC1 WG Reference Architecture Model and implements privacy protection mechanisms at all data modalities and at all levels to ensure GDPR compliance. The description of MARVEL framework’s main elements is organised according to the MARVEL’s four pillars mentioned earlier.

  1. AbdelBaky M, Zou M, Zamani AR, Renart E, Diaz-Montes J, and Parashar M (2017). “Computing in the continuum: Combining pervasive devices and services to support data driven applications”. In Proceedings of Distributed Computing Systems (ICDCS), 2017 IEEE 37th International Conference on. IEEE, pp. 1815–1824.

  2. Fog Computing and the Internet of Things: Extend the Cloud to Where the Things Are, Cisco,

  3. BDVA Strategic Research and Innovation Agenda (SRIA), version 4, Oct 2017.

  4. BDV Strategic research and innovation agenda V4, 2017.

  5. NIST Special Publication 500-325: Fog Computing Conceptual Model, 2018.

Key Facts

  • Project Coordinator: Dr. Sotiris Ioannidis
  • Institution: Foundation for Research and Technology Hellas (FORTH)
  • E-mail: marvel-info{at} 
  • Start: 01.01.2021
  • Duration: 36 months
  • Participating Organisations: 17
  • Number of countries: 12

Get Connected



This project has received funding from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No 957337. The website reflects only the view of the author(s) and the Commission is not responsible for any use that may be made of the information it contains.