Reporting code coverage of multiple code repositories

4 min readAug 9, 2022

Context

We measure the code coverage per repository because we want to know what code is tested, which gives us a notion of the code quality.

It gets tricky to report the test coverage to the higher levels of management and leadership once the project grows and we go from a few repositories to tens or hundreds of code repositories.

Calculating code coverage as GPA

Ideally, we would like to report a single number that tells us what is the trend of where we are going with the code coverage metric.

That single number needs to reflect the real code coverage of the whole product and it needs to be easy to follow.

The way we calculate it is similar to how GPA (grade point average) is calculated. Here are the steps.

Fetch the code coverage per repository

First, we fetch the test coverage of all the repositories, so we get the pairs of the repo name and the test coverage.

Example: Here is what we would be looking for.

+------------------+----------+
| repo             | coverage |
+------------------+----------+
| user-service     | 81%      |
| email-service    | 92%      |
| auth-service     | 75%      |
| deal-service     | 65%      |
| workflow-service | 97%      |
+------------------+----------+

Classify the coverage to a grade

Then we classify the numerical value (the coverage) into a letter rating (from A to F).

We are using a simplified GPA table for mapping between grade and numerical test coverage.

+--------------+--------------+-----------------+
| LETTER GRADE | GRADE POINTS | CODE COVERAGE   |
+--------------+--------------+-----------------+
| A            | 4.0          | 90–100          |
| B            | 3.5          | 80–89           |
| C            | 3.0          | 70–79           |
| D            | 2.0          | 60–69           |
| E            | 1.0          | 50-59           |
| F            | 0.0          | 0–49            |
+--------------+--------------+-----------------+

If needed, feel free to define more granular grades, like A-, A+, and so on.

Example: Here is how we would classify the example from above.

+------------------+----------+-------+
| repo             | coverage | grade |
+------------------+----------+-------+
| user-service     | 81%      | B     |
| email-service    | 92%      | A     |
| auth-service     | 75%      | C     |
| deal-service     | 65%      | D     |
| workflow-service | 97%      | A     |
+------------------+----------+-------+

Calculate the test coverage GPA

Now we calculate the GPA. First, we create a list of grades, then map it to the numerical grade points. The last step is to calculate the average out of the numerical grade points.

Example: Here is how we calculate the GPA in pseudolanguage.

// what got from the mapping of code coverage to the grade points
grades =      [B,   A,   C,   D,   A  ]
// grades translated to grade points
gradePoints = [3.5, 4.0, 3.0, 2.0, 4.0]// GPA is calculated as an average of all the grade points
gpa = AVERAGE(gradePoints)
// GPA for our repositories is 3.3

Observations

The GPA value does not change significantly over time, but if it goes down by a decimal point, it means we need to investigate what is going on. It usually means that a few code repositories with very low coverage have been recently added.
There might be repositories where the test coverage is not reported, we map those to an unknown grade. Then we can follow up and fix the code coverage reporting, or start writing tests if there are none.

Best practices

Automate the calculation of the “code coverage GPA” metric. It can be as easy as a script, which you can put into a schedule server-less function, that calculates the GPA and reports it into a database every week. Then we can connect the database to a data studio (or some visualization tool) and show the metrics on a timeline.
Report the outliers as a special metric for engineering teams. The idea is to report the unknown and very low code coverage repositories to make them visible to the engineers, so they can make them better.

What if?

What if there is a team that owns repositories with no or very-low test coverage?

If a team owns code repositories with very low test coverage, e.g. below 50%, then that should be identified as a significant technical debt and treated as such.

A solution depends on the size of the code repositories, priorities and team culture. For example, the way to improve could be the following: break down the test coverage report into individual code repositories, identify owner of each repository and set and track reasonable achievable goals.

Addition for the technical people

Calculating GPA in JavaScript

The input is an array of metrics that show how many times a grade occurs in all the repositories. The input example can look like this: { "A": 2, "B": 1, "C": 1, "D": 1, "E": 0, "F": 0, "unknown": 0}

Here is a JavaScript that shows the calculation of GPA.

const _ = require('lodash');function calculateGpa(metric) {
    const a = _.times(metric.A, _.constant(4));
    const b = _.times(metric.B, _.constant(4));
    const c = _.times(metric.C, _.constant(3));
    const d = _.times(metric.D, _.constant(2));
    const e = _.times(metric.E, _.constant(1));
    const f =  _.times(metric.F, _.constant(0));
    const unknown = []; 
    const all = _.concat(a, b, c, d, e, f, unknown);
    const gpa = _.mean(all);
    return parseFloat(gpa.toFixed(1));
}module.exports = calculateGpa;