Detailed Info

Efficient topic partitioning of Apache Kafka for high-reliability real-time data streaming applications

Authors:Theofanis P. Raptis, Claudio Cicconetti, Andrea Passarella
Title:

Efficient topic partitioning of Apache Kafka for high-reliability real-time data streaming applications

Abstract:

Apache Kafka is a widely-used event streaming platform for reliable high-volume real-time data exchange following a producer–consumer pattern. Despite its popularity, Apache Kafka requires expertise and attention to detail, and there are no default guidelines that can be applied to all use cases without careful consideration. In this paper, we propose a novel approach to optimise the number of partitions and brokers in Apache Kafka, which are two key configuration parameters, under the given characteristics and constraints of the target applications. In particular, we consider the distribution of data-intensive real-time flows exchanged between a set of producers and consumers, which is representative of fog computing environments for ML/AI analytics. We introduce a methodology for modelling the topic partitioning process in Apache Kafka and formulate an optimisation problem to determine the optimal number of partitions to satisfy the application requirements and constraints. We propose two efficient heuristics to solve the optimisation problem, considering the trade-off between resource utilisation and application performance. We evaluate the performance of our approach through numerical simulations, and we demonstrate its practicality by implementing a prototype on an Apache Kafka cluster and conducting experiments in three different scenarios focused on mass consumption vs. production and real-time data streaming. To carry out repeatable experiments in controlled conditions, we developed a reusable framework that fully automatises cluster setup and performance assessment, and we make it available to the community as open-source software.

Publication type:

Journal
Title of the journal:

Future Generation Computer Systems

Year of Publication

2024

Pages:173-188
Number, date or frequency of the Journal:Volume 154, May 2024
Publisher:Elsevier
URL:https://zenodo.org/records/10489464
DOI10.1016/j.future.2023.12.028

Key Facts

  • Project Coordinator: Dr. Sotiris Ioannidis
  • Institution: Foundation for Research and Technology Hellas (FORTH)
  • E-mail: marvel-info@marvel-project.eu 
  • Start: 01.01.2021
  • Duration: 36 months
  • Participating Organisations: 17
  • Number of countries: 12

Get Connected

Funding

eu FLAG

This project has received funding from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No 957337. The website reflects only the view of the author(s) and the Commission is not responsible for any use that may be made of the information it contains.