Long Term Effects of Mutation Testing

Gordon Fraser
René Just
2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), IEEE, pp. 910-921
Google Scholar

Abstract

Various proxy metrics for test quality have been
defined in order to guide developers when writing tests. Code
coverage is particularly well established in practice, even though
the question of how coverage relates to test quality is a matter of
ongoing debate. Mutation testing offers a promising alternative:
Artificial defects can identify holes in a test suite, and thus provide
concrete suggestions for additional tests. Despite the obvious
advantages of mutation testing, it is not yet well established in
practice. Until recently, mutation testing tools and techniques
simply did not scale to complex systems. Although they now
do scale, a remaining obstacle is lack of evidence that writing
tests for mutants actually improves test quality. In this paper, we
fill this gap. We analyze a large dataset of 15 million mutants
and investigate how the mutants influenced developers over time,
and how the mutants relate to real faults. Our analyses suggest
that developers using mutation testing write more tests, and
actively improve their test suites with high quality tests such
that fewer mutants remain. By analyzing a dataset of historic
fixes of real faults we further provide evidence that mutants are
indeed coupled with real faults. In other words, had mutation
testing been used for the changes introducing the faults, it would
have reported a live mutant that could have prevented the bug.

Research Areas