The main purpose of testing is to verify code works as expected. Not just now, but in the future as well. It gives us confidence we didn’t break anything and no bugs are introduced. Writing tests is therefore a crucial part of software development that prevents us from having to do a lot of bug fixing after each change.
It is important that nearly all of the code is being tested, in a complex project with thousands or millions of lines of code. How do we know we have enough tests? Just counting the amount of tests and setting a required ration on the lines of codes will not work as some lines of code require more tests than others. Moreover this can easily by manipulated if the teams feels pressure to reach this level. A better approach is to determine all the scenario’s that need to be tested. This gives us confidence that these scenario’s do work, there might however be other case that have been forgotten or due to implementation details act different from others. Determining the scenario’s in a white box fashion will prevent this as the implementation details are used. But the cases that have to be tested will change often as implementation changes, and verifying that all cases are covered (and re-determining the interesting cases) is a manual action that will take up a lot of time. A much better way is to automatically gather information about which lines of code are executed by running the tests, called code coverage.
Some teams set on a minimum level of code coverage, and while the goal should always be to hit 100% it is often not possible and feasible to get there. When doing continuous integration this can be used as a check before code can be merged into the master branch. But code coverage comes in many different forms, each counting different things, and choosing the right one is critical. The different types mentioned here are those identified by EclEmma (JaCoCo):
- Class coverage: Counts the percentage of classes being touched.
- Method coverage: Counts the percentage of methods being touched.
- Line Coverage: Counts the percentage of lines being touched.
- Branch Coverage: Counts the percentage of branches being touched.
- Instruction Coverage: Counts the percentage of instructions being touched.
It is clear to see that these different types of coverage zoom in, from class coverage, which is very coarse-grained to the fine-grained instruction coverage. Class, method and line coverage are all pretty straightforward, in the sense that either a class, method or line is covered or it is not. Achieving 100% with these coverage methods is easy. But even 100% line coverage does not mean you have tested every piece of the code.
An example of missing code is an if condition. If the condition is true the body is executed and all lines are covered, those after the condition as well. And even though the condition check is covered on a line level, it has never been tested with a condition that is false. This is solved by both branch and instruction coverage, meaning the only true code coverage is branch coverage. Instruction coverage is good too, but is harder to find out which instructions were missed. Branches are easier to see, and there tend to be some instructions that are nearly impossible to hit.
Gathering code coverage and verifying code is actually being tested is great. But can we really assert that the code works as intended? Even a 100% branch coverage does not guarantee that the software works, as you might have written bad tests. Either by asserting on the wrong value, or not asserting on nothing at all. This may sound absurd, but it happens more often than you can imagine either by forgetting something or a return value that seems correct at first sight, but actually should have been something else. I haven even seen tests that do a pixel-by-pixel comparison of the result with a reference image, only to find out that the result and the reference image where cut off as the browser was too small to show it completely. This means that everything in the part that is cutoff is not being tested.
But if 100% coverage does not gives us any guarantees, then why bother at all? Well, even though it can’t guarantee anything, it does give us more confidence as in most cases the tests will be well written. On the other hand code that has not been covered, does guarantee us that we do not know how it reacts and that bugs will not be discovered. Just eliminating the certainty of uncertainty is worth the investment in decent tests and calculating their coverage.
There exists a way to actually test your tests: mutation testing. Instead of just running your tests and gathering code coverage, mutation testing will change your code to alter behaviour and verify that a test has failed. If no test failed, then the mutation lives, and there is a test missing. This is a very strong way to make sure your code is being well tested. Mutation testing is however much more expensive as it has to run your tests for each mutation, this can easily lead to a much higher test time than with simple mutation testing. I have little experience with mutation testing, but I am eager to try it out and share my findings in a later blog post.