100% code coverage by static analysis - is it that good?

Aug 29 2012

Author: Dmitry Novikov

Many programmers think that the more error messages a static code analyzer produces, the better. It would be true if all the messages hit the bull's eye, as they say. But this is impossible: the same warnings may be considered both true and false by different programmers depending on the project type. There is also one more important and interesting thing. It may appear that a line between a false positive and a real error is very thin. Let's have a look at one of these cases.

We have got several similar comments from users concerning false positives generated by the analyzer. When studying the code fragments they sent to us, we faced the question how much necessary and reasonable it is to obtain 100% code coverage. This question had never arisen before, as we had been sure that the more code is analyzed, the better.

We have decided to refine this approach and change the code analyzer's behavior. Consider one of the code samples sent to us by users:

long adjustment = 0;
long total_weighting = 0;
... //the 'total_weighting' variable is not modified here
if( total_weighting > 0 )
{
  adjustment /= total_weighting;
}

An ambiguity occurs: the 'total_weighting' variable is ALWAYS equal to 0 and is not changed anywhere. If we check the "adjustment /= total_weighting;" line, we should generate the error message "V609 Divide by zero. Denominator 'total_weighting' == 0".

But it is obvious that this code branch will never be executed if the 'total_weighting' variable equals zero. The attempt to divide by zero won't occur.

It turns out that we shouldn't check the whole code. Since such cases appeared to be rather numerous, we have decided not to analyze those code fragments that never get control.

To implement this we employ the following mechanism. The analyzer calculates and gathers those expressions whose values are known without executing the program. For example:

int a = 1;
int b = a + 1;

We know for sure that 'b' will take value '2' here.

If some conditional operators contain logical expressions that surely take values 'true' or 'false', we make a decision accordingly whether or not we should analyze the code branches:

If the value is always true, only the 'then' branch will be analyzed.
If the value is always false, only the 'else' branch will be analyzed.

But when it is impossible to calculate the value of a logical expression in a conditional operator, both the 'then' and 'else' branches will be analyzed.

To have a better understanding of what is going on let's look at a possible continuation of the code from the previous sample:

int a = 1;
int b = a + 1;
if (b != 2)
{
  int *p = 0; *p = 1;
}
else
{
  ...
}

The 'b != 2' expression will always be false, which means that only the 'else' branch will be executed. Accordingly, no warning about null pointer dereferencing will be generated, as this error will never occur when executing the program.

If the initial conditions are unknown, the code analyzer's behavior will change. For instance, we will produce the error message for this code:


int b = rand() % 10;
if (b != 2)
{
  int *p = 0; *p = 1; //Error!
}
else
{
  ...
}

We omitted the question what for we need code branches that are never executed. Shouldn't we consider them errors? No, this code appears when using a lot of various programming methods. Here are some of them:

Executing different code fragments depending on the version. For example: if (Version == VERSION_1) ... else if (Version == VERSION_2) ...". It resembles preprocessor constructs #if-#endif but allows you to be sure that all the code branches can be successfully compiled.
Commenting out code fragments. You can know for sure that even though the code is not executed, it still can be properly compiled.
Code fragments programmers use for debugging purposes. While performing debugging you can enter such a code fragment and use it to do anything you need: for example, get access to certain values.
Different actions depending on type sizes. For example: if (sizeof(void *) > sizeof(int)).
Macro programming.
Other methods.

Now it turns clear that "more" doesn't necessarily mean "better". 100% code coverage cannot be an indicator of a quality code estimate. Having got rid of analysis for code fragments never executed, we obtain a fewer number of error messages, while the analysis quality is getting higher. An important error message has fewer chances to get lost among false positives, and the number of real errors remains the same.