Philosophy behind PVS-Studio static code analyzer

Jun 26 2017

Author: Andrey Karpov

We regularly get requests and recommendations concerning the improvements of the analyzer abilities. We put most of the proposals on our Todo-list and implement eventually. We are really grateful to our users for the feedback and the suggestions, so I would like to thank them! Thank you! We implement a lot of offers, but not all of them. The thing is that some of the suggested diagnostics do not quite fit the philosophy of PVS-Studio analyzer. I decided to write this post to explain our position and the way we conclude whether to implement some diagnostics or not.

0516_Philosophy_behind_PVS_Studio/image1.png

There are two philosophical approaches to the development of static analyzers:

To complain about everything that doesn't seem right.
To complain about those fragments that are most likely wrong for some reason.

The first approach allows you to detect more errors than the second one. However, we believe that this is a road to nowhere, because the number of false positives makes it impossible to use the analyzer. Developing of PVS-Studio, we follow the second approach and issue warnings only in those cases when there are grounds or suspicion.

We get offers to create diagnostics of the following kind:

You should issue a warning to A / B if you are not certain that B != 0.
You should issue a warning if there is no certainty that when calling the strcpy(DST, SRC) function, the DST buffer is of sufficient size.
You should issue a warning upon the dereference of a pointer if there is no certainty that pointer != NULL.
You should issue a warning to sqrt(X), if you are not certain that X >= 0.
And so on.

Each of those warnings looks reasonable and useful. However, implemented together, they will kill the code analyzer. Each diagnosis, implemented in such a way, will cause a large number of false positives. Of course, if we write only one diagnostic, there will not be any trouble. However, if we implement search for divisions about which are not sure, then why should we not implement search for untrusted sqrt? There is no boundary or criterion where you should stop making such diagnostics.

Here we can see the broken Windows theory. If we just create a couple of Diagnostics like "Complain about everything that doesn't seem right" and the process will be irreversible. These diagnostics open the Pandora's Box. For example, it is unclear how to avoid the implementation of the addition of two variables "A+B" of the int type, if there is no certainty that the overflow will not occur. And so on and so forth.

Following at this idea, the analyzer will start issuing warnings for the code:

void Foo(int &i)
{
  i++;  // Overflow is possible.
}

Formally, it's all correct and the diagnostic is quite useful. Overflow of a variable of type int causes undefined behavior. The analyzer should complain if it cannot be sure that all the ranges of the variable, passed to the function, are safe.

In the end, we will get an analyzer that complains about every fifth line of the code it can be just thrown away.

The reader might argue: you exaggerate and push to the point of absurdity, let's make a diagnostic, which complains about the use of an unchecked pointer and stop.

That is the thing that it will not be possible to stop. If we make for one user the search of pointers, how will we explain to another user that we will not search four unchecked divisions? Null pointer dereference is not more serious than division by zero.

Does this mean that the PVS-Studio analyzer does not search for null pointer dereference or division by zero? Of course it does.

But the error search is implemented following the logic "to complain about those fragments that are most likely wrong for some reason." In other words, there must exist some signs that indicate that your code might contain an error.

Let's look at this using examples.

void F(int *P)
{
  *P = 1;
}

PVS-Studio does not complain about code, because it has no reasons for it. The fact that the pointer is not used without a check doesn't mean that the program has an error.

The analyzer needs additional information that will directly or indirectly indicate an error. The most obvious case is when the analyzer notices an incorrect call of such a function.

void F(int *P)
{
  *P = 1;
}
void Foo()
{
  F(0);
}

PVS-Studio: V522 Dereferencing of the null pointer 'P' might take place. The null pointer is passed into 'F' function. Inspect the first argument. Check lines: 'simple.cpp:69'. simple.cpp 64

It's all clear here, if NULL is explicitly passed to the function then it is error. Of course we don't see a lot of such fragments in practice, that's why let's take a look at something which is closer to the reality.

void F(int *P)
{
  *P = 1;
}
void Foo()
{
  int *X = (int *)malloc(sizeof(int));
  F(X);
  free(X);
}

V522 Dereferencing of the null pointer 'P' might take place. The potential null pointer is passed into 'F' function. Inspect the first argument. Check lines: 'simple.cpp:70'. simple.cpp 64

The analyzer issued a warning again but it is about a "potentially null pointer". It is meant that the malloc function can return a NULL pointer and the return pointer must be checked before the use.

It is a good example when the analyzer uses additional information to complain only when the code is really dangerous.

For example if we modify in the following way, the analyzer will be silent.

void F(int *P)
{
  *P = 1;
}
void Foo()
{
  int *X = (int *)malloc(sizeof(int));
  if (!X)
    return;
  F(X);
  free(X);
}

Now it's all fine and there are no warnings.

Are there any other situations when the analyzer issues a warning? Yes. To demonstrate it, I chose as short example from the list of errors found in open source projects with the help of diagnostic V595.

FName UKismetNodeHelperLibrary::GetEnumeratorName(
  const UEnum* Enum, uint8 EnumeratorValue)
{
  int32 EnumeratorIndex = Enum->GetIndexByValue(EnumeratorValue);
  return (NULL != Enum) ?
         Enum->GetEnum(EnumeratorIndex) : NAME_None;
}

PVS-Studio: V595 The 'Enum' pointer was utilized before it was verified against nullptr. Check lines: 146, 147. kismetnodehelperlibrary.cpp 146

What makes the PVS-Studio analyzer issue a warning for the pointer dereference? It is the fact that after the dereference this pointer is verified against NULL. The existence of such a check is a ground to issue such a warning.

Let's have a look at a different example.

void F(char *A, char *B)
{
  strcpy(A, B);
}

Should we issue a warning here? From our point of view, no. Using the strcpy function is not a mistake.

If there is a desire to check the correctness of all such function calls, then the analyzer has nothing to do with it. You can do a search in the program of all strcpy and examine the code.

Of course it's not very convenient to look for strcpy functions manually. At the same time, Visual C++ warns about the presence of such functions and offers to replace them with safer alternatives. For the code, provided before, it will issue:

warning C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.

When will PVS-Studio issue a warning for the strcpy function? When there is a reason for this. Example:

void F()
{
  size_t bad_buf_size = strlen("string"); // forgotten + 1
  char *A = (char *)malloc(bad_buf_size);
  if (A)
    strcpy(A, "string");
  free(A);
}

PVS-Studio: V512 A call of the 'strcpy' function will lead to overflow of the buffer 'A'. consoleapplication1.cpp 14

I hope I was able to clarify our philosophy. Based on its ideas, we divide all the diagnostics into two categories:

Everything that is not checked is dangerous - it's not rational and is not used.
Only suspicious looking fragments are considered to be dangerous - we gradually implement it, if it is technically possible.

It is important not to issue as many warnings as possible. It's not something extraordinary, as it is not clear, how to review all these warnings later and what to do with them. It's important to help the user find errors, so we try to work on the analyzer in this regard.

#StaticAnalysis