Content vs metrics: Using language modeling to evaluate in-line source code comments for Python
Date
2020
item.page.datecreated
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Documentation is vital to the understanding, maintenance and, ultimately, survival of
software projects . And yet, a lot of software projects either lack documentation, or are
very poorly documented. This results in a gradual decline in the quality of the code
and may require complete overhauls in extreme cases. It is therefore important to evaluate
documentation to ensure that it conveys clear and meaningful ideas. While existing
methods of evaluating documentation are metrics based and look at the structure of documentation
examples, this paper explores the possibility of evaluating documentation by
assessing its contents. There is, however, a lack of an existing corpus of documentation
for natural language processing tasks. A corpus of Python function/method comments
is assembled, and a language modeling experiment is performed on them. The results of
this experiment are mixed. While they show that it is possible to evaluate documentation
by looking at its content as opposed to structure, they also show that this approach may
not necessarily be more accurate, with lower quality comment examples having higher
probability than those of higher quality.
Description
Undergraduate thesis submitted to the Department of Computer Science, Ashesi University, in partial fulfillment of Bachelor of Science degree in / Computer Science, May 2020
item.page.type
Undergraduate thesis
item.page.format
Keywords
documentation, software projects, natural language processing (NLP)