You are required to read and agree to the below before accessing a full-text version of an article in the IDE article repository.

The full-text document you are about to access is subject to national and international copyright laws. In most cases (but not necessarily all) the consequence is that personal use is allowed given that the copyright owner is duly acknowledged and respected. All other use (typically) require an explicit permission (often in writing) by the copyright owner.

For the reports in this repository we specifically note that

  • the use of articles under IEEE copyright is governed by the IEEE copyright policy (available at
  • the use of articles under ACM copyright is governed by the ACM copyright policy (available at
  • technical reports and other articles issued by M‰lardalen University is free for personal use. For other use, the explicit consent of the authors is required
  • in other cases, please contact the copyright owner for detailed information

By accepting I agree to acknowledge and respect the rights of the copyright owner of the document I am about to access.

If you are in doubt, feel free to contact

Automated Functional Dependency Detection Between Test Cases Using Doc2Vec and Clustering



Publication Type:

Conference/Workshop Paper


The First IEEE International Conference On Artificial Intelligence Testing


Knowing about dependencies and similarities between test cases is beneficial for prioritizing them for cost-effective test execution. This holds especially true for the time consuming, manual execution of integration test cases written in natural language. Test case dependencies are typically derived from requirements and design artifacts. However, such artifacts are not always available, and the derivation process can be very time-consuming. In this paper, we propose, apply and evaluate a novel approach that derives test cases' similarities and functional dependencies directly from the test specification documents written in natural language, without requiring any other data source. Our approach uses an implementation of Doc2Vec algorithm to detect text-semantic similarities between test cases and then groups them using two clustering algorithms HDBSCAN and FCM. The correlation between test case text-semantic similarities and their functional dependencies is evaluated in the context of an on-board train control system from Bombardier Transportation AB in Sweden. For this system, the dependencies between the test cases were previously derived and are compared to the results our approach. The results show that of the two evaluated clustering algorithms, HDBSCAN has better performance than FCM or a dummy classifier. The classification methods' results are of reasonable quality and especially useful from an industrial point of view. Finally, performing a random undersampling approach to correct the imbalanced data distribution results in an F1 Score of up to 75% when applying the HDBSCAN clustering algorithm.


author = {Sahar Tahvili and Leo Hatvani and Michael Felderer and Wasif Afzal and Markus Bohlin},
title = {Automated Functional Dependency Detection Between Test Cases Using Doc2Vec and Clustering},
month = {April},
year = {2019},
booktitle = {The First IEEE International Conference On Artificial Intelligence Testing},
url = {}