You are required to read and agree to the below before accessing a full-text version of an article in the IDE article repository.
The full-text document you are about to access is subject to national and international copyright laws. In most cases (but not necessarily all) the consequence is that personal use is allowed given that the copyright owner is duly acknowledged and respected. All other use (typically) require an explicit permission (often in writing) by the copyright owner.
For the reports in this repository we specifically note that
- the use of articles under IEEE copyright is governed by the IEEE copyright policy (available at http://www.ieee.org/web/publications/rights/copyrightpolicy.html)
- the use of articles under ACM copyright is governed by the ACM copyright policy (available at http://www.acm.org/pubs/copyright_policy/)
- technical reports and other articles issued by M‰lardalen University is free for personal use. For other use, the explicit consent of the authors is required
- in other cases, please contact the copyright owner for detailed information
By accepting I agree to acknowledge and respect the rights of the copyright owner of the document I am about to access.
If you are in doubt, feel free to contact webmaster@ide.mdh.se
A Multimodal Approach for Enhancing Decision Support in Remote Digital Tower
Publication Type:
Conference/Workshop Paper
Venue:
2025 10th International Conference on Machine Learning Technologies
Abstract
Trustworthy decision support systems utilizing a
multimodal approach (MMA) integrate diverse data modalities
to enhance robustness, transparency, and fairness in artificial
intelligence (AI) applications. In this study, we present an MMA
for decision support in the Air Traffic Management (ATM)
domain, particularly within Remote Digital Towers (RDTs).
RDTs replace traditional control towers with AI-driven digi-
tal solutions, enhancing operational efficiency. Our approach
addresses key multimodal challenges—translation, alignment,
and co-learning—by implementing (a) an open-vocabulary-based
object detection model for video processing and (b) an audio-
to-text transcription and semantic word identification model.
The YOLO-World deep-learning model is employed for ob-
ject detection, while audio data analysis takes advantage of
a benchmark data set, semantic identification techniques, and
explainability. Additionally, the system integrates robust ma-
chine learning techniques, including data augmentation and
perturbation, to maintain consistent performance across varied
operational conditions. This proof-of-concept demonstrates the
potential of multimodal AI systems to enhance decision support
and improve safety in ATM environments.
Bibtex
@inproceedings{Ahmed7173,
author = {Mobyen Uddin Ahmed and Shaibal Barua and Mir Riyanul Islam and Ricky Stanley D Cruze and Shahina Begum and Sara Kebir and Alexandre Veyrie and Christophe Hurter},
title = {A Multimodal Approach for Enhancing Decision Support in Remote Digital Tower},
month = {September},
year = {2025},
booktitle = {2025 10th International Conference on Machine Learning Technologies },
url = {http://www.es.mdu.se/publications/7173-}
}