INTERNATIONAL JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES

FAULT TOLERANCE AND RELIABILITY IN KUBERNETES-ORCHESTRATED MULTI-AGENT SYSTEMS: UNIVERSITY SCHEDULING CASE STUDY

Authors

  • B. Kumalakov Astana IT University
  • A. Kaziz Astana IT University

DOI:

https://doi.org/10.54309/IJICT.2025.21.1.013

Keywords:

machine learning, MAS, MAS Optimization, fault detection, MAS maintenance, cloud-native deployment

Abstract

Multi-Agent Systems act a particular role in distributed computing and in environments requiring autonomous coordination, such as robotics, cloud computing, and traffic management. However, ensuring fault tolerance and reliability in MAS remains a significant challenge, particularly in large-scale deployments. This study investigates the impact of Kubernetes-based orchestration on the fault tolerance of MAS, evaluating mechanisms such as automated scaling, redundancy strategies, and self-healing capabilities. Experimental results demonstrate that Kubernetes enhances MAS resilience by reducing failure frequency and improving Mean Time to Recovery. The study also identifies trade-offs between performance and resource consumption, showing that while redundancy and auto-scaling improve system robustness, they introduce computational overhead. Affinity-based scheduling and selective redundancy strategies were found to balance efficiency and reliability effectively. The findings have significant implications for real-world MAS deployments, particularly in optimizing Kubernetes configurations to achieve fault tolerance without excessive resource utilization. Future research should focus on AI-driven scaling, hybrid cloud-edge execution, and enhanced fault detection mechanisms to further improve MAS reliability and efficiency in dynamic environments.

Downloads

Downloads

Published

2025-03-15

How to Cite

Kumalakov, B., & Kaziz, A. (2025). FAULT TOLERANCE AND RELIABILITY IN KUBERNETES-ORCHESTRATED MULTI-AGENT SYSTEMS: UNIVERSITY SCHEDULING CASE STUDY. INTERNATIONAL JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, 6(1), 185–200. https://doi.org/10.54309/IJICT.2025.21.1.013
Loading...