Content vs metrics: Using language modeling to evaluate in-line source code comments for Python

Boham, Maame Efua

Content vs metrics: Using language modeling to evaluate in-line source code comments for Python

Files

Boham_Maame_2020_CS_Thesis.pdf (493.21 KB)

Date

2020

Authors

Boham, Maame Efua

Abstract

Documentation is vital to the understanding, maintenance and, ultimately, survival of software projects . And yet, a lot of software projects either lack documentation, or are very poorly documented. This results in a gradual decline in the quality of the code and may require complete overhauls in extreme cases. It is therefore important to evaluate documentation to ensure that it conveys clear and meaningful ideas. While existing methods of evaluating documentation are metrics based and look at the structure of documentation examples, this paper explores the possibility of evaluating documentation by assessing its contents. There is, however, a lack of an existing corpus of documentation for natural language processing tasks. A corpus of Python function/method comments is assembled, and a language modeling experiment is performed on them. The results of this experiment are mixed. While they show that it is possible to evaluate documentation by looking at its content as opposed to structure, they also show that this approach may not necessarily be more accurate, with lower quality comment examples having higher probability than those of higher quality.

Description

Undergraduate thesis submitted to the Department of Computer Science, Ashesi University, in partial fulfillment of Bachelor of Science degree in / Computer Science, May 2020

Keywords

documentation, software projects, natural language processing (NLP)

URI

http://hdl.handle.net/20.500.11988/679

Collections

Senior Theses and Projects

Full item page

Content vs metrics: Using language modeling to evaluate in-line source code comments for Python

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

item.page.type

item.page.format

Keywords

Citation

URI

DOI

Collections