You are required to read and agree to the below before accessing a full-text version of an article in the IDE article repository.

The full-text document you are about to access is subject to national and international copyright laws. In most cases (but not necessarily all) the consequence is that personal use is allowed given that the copyright owner is duly acknowledged and respected. All other use (typically) require an explicit permission (often in writing) by the copyright owner.

For the reports in this repository we specifically note that

  • the use of articles under IEEE copyright is governed by the IEEE copyright policy (available at http://www.ieee.org/web/publications/rights/copyrightpolicy.html)
  • the use of articles under ACM copyright is governed by the ACM copyright policy (available at http://www.acm.org/pubs/copyright_policy/)
  • technical reports and other articles issued by M‰lardalen University is free for personal use. For other use, the explicit consent of the authors is required
  • in other cases, please contact the copyright owner for detailed information

By accepting I agree to acknowledge and respect the rights of the copyright owner of the document I am about to access.

If you are in doubt, feel free to contact webmaster@ide.mdh.se

A Multimodal Approach for Enhancing Decision Support in Remote Digital Tower

Authors:

Mobyen Uddin Ahmed, Shaibal Barua, Mir Riyanul Islam, Ricky Stanley D Cruze , Shahina Begum, Sara Kebir , Alexandre Veyrie , Christophe Hurter

Publication Type:

Conference/Workshop Paper

Venue:

2025 10th International Conference on Machine Learning Technologies


Abstract

Trustworthy decision support systems utilizing a multimodal approach (MMA) integrate diverse data modalities to enhance robustness, transparency, and fairness in artificial intelligence (AI) applications. In this study, we present an MMA for decision support in the Air Traffic Management (ATM) domain, particularly within Remote Digital Towers (RDTs). RDTs replace traditional control towers with AI-driven digi- tal solutions, enhancing operational efficiency. Our approach addresses key multimodal challenges—translation, alignment, and co-learning—by implementing (a) an open-vocabulary-based object detection model for video processing and (b) an audio- to-text transcription and semantic word identification model. The YOLO-World deep-learning model is employed for ob- ject detection, while audio data analysis takes advantage of a benchmark data set, semantic identification techniques, and explainability. Additionally, the system integrates robust ma- chine learning techniques, including data augmentation and perturbation, to maintain consistent performance across varied operational conditions. This proof-of-concept demonstrates the potential of multimodal AI systems to enhance decision support and improve safety in ATM environments.

Bibtex

@inproceedings{Ahmed7173,
author = {Mobyen Uddin Ahmed and Shaibal Barua and Mir Riyanul Islam and Ricky Stanley D Cruze and Shahina Begum and Sara Kebir and Alexandre Veyrie and Christophe Hurter},
title = {A Multimodal Approach for Enhancing Decision Support in Remote Digital Tower},
month = {September},
year = {2025},
booktitle = {2025 10th International Conference on Machine Learning Technologies },
url = {http://www.es.mdu.se/publications/7173-}
}