As mentioned in the previous post, I have examined using mutation testing as a way to verify that everything is being tested. In this blog post I will share my experience and the conclusion of this experiment. Mutation testing is a way to test your tests and be confident that bugs will be caught. This is achieved by performing mutations on your code (to change the behaviour) and running all (relevant) tests again to check that at least one has failed. If all tests passed then the mutation was not caught and a test is missing. For my experiment I have used PIT, the eclipse plugin to be more precise.
I encountered a couple of problems that prevented me from using the tool out of the box. It can not handle tests for eclipse plugins, which is a shame because nearly all of the code we write is build on top of eclipse and equinox. I did find some code that did not depend on equinox and I was able to run as normal jUnit tests, this allows me to run PIT against them.
The first time running PIT immediately failed to give me any results. PIT first runs your tests without any mutations to gather a coverage report. This report is used to identify the relevant tests for each mutation. Why could PIT not give me any results? There were some tests that failed without mutations. Weird, because running the same tests with jUnit worked just fine. Those tests where designed to verify my preconditions using tests annotated with @test(expected = AssertionError.class). Disabling these tests brought me another step closer to actual mutation testing.
Just another small victory, as it PIT could not find the code the tests were executing. This is caused by our clear separation of product and test code in different plugins. In this case the test code was a fragment of the actual code, which is a common construct when working with eclipse plugins. I tried adding the plugin to the class path but nothing seemed to work. To work around this and be able to use the plugin I quickly moved the test code the same plugin.
A few setbacks, but in the end I was able to mutation test my code. As the first results roll in I see something interesting, a class with only 96% line coverage has been mutation tested for 100%. A nice example of why 100% line coverage may not be desired, this line was just a small optimization that skipped a calculation if we know the result would not be relevant. This is something that is hard to cover as it is a complete internal mechanism. While my code coverage warns me about this line, mutation testing does not worry about it, as the line could not be mutated. The condition to apply the optimization could, but that mutation was successfully killed.
As mentioned in my previous post, line coverage is not a good code coverage measurement at all. Since I prefer branch coverage I am more interesting in that value, which to my surprise is only 68.2%. The coverage is that low because I had to disable my tests that verify my preconditions, enabling these tests again resulted in an acceptable branch coverage.
A closer look at the results of PIT showed that it did not do any mutations on my preconditions, I checked that I had my assertions enabled on all runs but still it did not take these into account for some reason. This is most likely also the reason why my tests failed without any mutations, they didn’t throw any AssertionError causing my test to fail. I consider the preconditions to be as much part of the contract as any other result and they should be mutation tested as well.
Code coverage is not perfect with my preconditions either, a nice example of this is:
assert ((expression != null) && !expression.isEmpty());
My code coverage says I covered 5 out of 6 branches, but logic tells me there are only 4 possible branches:
- false && false
- false && true
- true && false
- true && true
Taking a closer look, the second part has 4 branches on its own, which leads me to believe that a null check is done and taken into account here. But even then, which branch did I miss? Well obviously the one where both are false, but this branch is impossible to cover due to lazy evaluation of Java. As a matter of fact, I depend on lazy evaluation here. Another reason why 100% code coverage is not always useful, while I could separate the statement I believe this makes more sense and is more clear.
Conclusion: mutation testing is a good tool in your box, and I can highly recommend doing it. As it takes a lot more time then just running your tests my advice is to run them only when you are adding new code as a check. Only if all mutations are killed the merge should take place. I am still not convinced that mutation testing can be used to fully replace code coverage, for the time being I would suggest using both of them. Both mutation testing and code coverage can however not verify that the method returns what the business expects it to return. Tests that check for the wrong values will be accepted by both, and be considered to be good. Nothing will ever change this, it remains crucial to pay attention what you are checking in your tests.
PIT suffers some shortcomings, but I do not know if this is just the eclipse plugin or PIT itself. Even if it is PIT itself, I can hardly blame them as our way of working has caused many more problems and requires very specific tools to work. I do believe any normal project can take full advantage of PIT, it is our way of working that limits us.