You are required to read and agree to the below before accessing a full-text version of an article in the IDE article repository.

The full-text document you are about to access is subject to national and international copyright laws. In most cases (but not necessarily all) the consequence is that personal use is allowed given that the copyright owner is duly acknowledged and respected. All other use (typically) require an explicit permission (often in writing) by the copyright owner.

For the reports in this repository we specifically note that

  • the use of articles under IEEE copyright is governed by the IEEE copyright policy (available at http://www.ieee.org/web/publications/rights/copyrightpolicy.html)
  • the use of articles under ACM copyright is governed by the ACM copyright policy (available at http://www.acm.org/pubs/copyright_policy/)
  • technical reports and other articles issued by M‰lardalen University is free for personal use. For other use, the explicit consent of the authors is required
  • in other cases, please contact the copyright owner for detailed information

By accepting I agree to acknowledge and respect the rights of the copyright owner of the document I am about to access.

If you are in doubt, feel free to contact webmaster@ide.mdh.se

When Retriever Meets Generator: A Joint Model for Code Comment Generation

Fulltext:


Authors:

Tien P. T. Le , Anh M. T. Bui , Huy N. D. Pham , Alessio Bucaioni, Thanh Phuong Nguyen

Publication Type:

Conference/Workshop Paper

Venue:

the 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.


Abstract

Automatically generating concise, informative com- ments for source code can lighten documentation effort and accelerate program comprehension. Retrieval-augmented ap- proaches first fetch code snippets with existing comments and then synthesize a new comment, yet retrieval and generation are typically optimized in isolation, allowing irrelevant neighbors to propagate noise downstream. To tackle the issue, we propose a novel approach named RAGSum with the aim of both effectiveness and efficiency in recommendations. RAGSum is built on top of fuse retrieval and generation using a single CodeT5 backbone. We report preliminary results on a unified retrieval-generation framework built on CodeT5. A contrastive pre-training phase shapes code embeddings for nearest-neighbor search; these weights then seed end-to-end training with a composite loss that (i) rewards accurate top-k retrieval; and (ii) minimizes comment- generation error. More importantly, a lightweight self-refinement loop is deployed to polish the final output. We evaluated the framework on three cross-language benchmarks (Java, Python, C), and compared it with three well-established baselines. The results show that our approach substantially outperforms the baselines with respect to the BLEU, METEOR, and ROUTE-L scores. These early findings indicate that tightly coupling retrieval and generation can raise the ceiling for comment automation and motivate forthcoming industrial replications and qualitative developer studies.

Bibtex

@inproceedings{P. T. Le7226,
author = {Tien P. T. Le and Anh M. T. Bui and Huy N. D. Pham and Alessio Bucaioni and Thanh Phuong Nguyen},
title = {When Retriever Meets Generator: A Joint Model for Code Comment Generation},
month = {September},
year = {2025},
booktitle = {the 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.},
url = {http://www.es.mdu.se/publications/7226-}
}