Andrey Karpov

Aug 08 2014

Tags:

#Cpp #Compiler

Checking PVS-Studio with Clang

Aug 08 2014

Author: Andrey Karpov

My impressions of the html-reports
Severity of the errors detected
False positives
Detected errors
Conclusions
References:

Yes, you've read it right. This time we are writing an "opposite" article: it's not about us checking some third-party project but about our own tool checked by another tool. We actually had performed such verifications before. For instance, we checked PVS-Studio with Cppcheck, Visual Studio static analyzer, inspected Intel C++ warnings. But there were no grounds for writing such an article: none of those tools found anything of interest. But Clang has managed to attract our attention with its diagnostic messages.

We analyzed Clang with PVS-Studio twice [1, 2] and found something interesting each time. However, we have always failed to do the opposite check. Clang developers have been reporting for a long time already that they can successfully build projects developed with Visual C++ under Windows. But we haven't managed to fulfill that in practice yet; or maybe we just have been unlucky all the time.

However, we realized recently that we could easily check our analyzer with Clang. We simply needed to approach the task a bit differently. Every night, we have the command-line version of PVS-Studio built under Linux by GCC. And the GCC compiler can be easily substituted with Clang. So we could easily try checking PVS-Studio. And it has worked: the same day this bright idea struck one of our colleagues, we got an analysis report for PVS-Studio. Here I am now telling you about this report's contents and my impressions of it.

My impressions of the html-reports

Of course, I've already dealt with Clang a few times. But it is difficult to estimate the analysis quality with third-party projects. I cannot often figure out if some issue is a real bug or not. What especially scares me is when Clang tells me I have to check a path consisting of 37 points in the source code.

PVS-Studio source code, on the contrary, is well familiar to me, so I've finally got the opportunity to thoroughly examine Clang's report. Unfortunately, it has confirmed my earlier impressions that the path to reach the detected error shown by Clang is often excessive and may confuse the programmer. Sure, I understand that providing program execution key points and building such a path is an extremely difficult and large task. Well, we in PVS-Studio don't even dare to take it on. But since Clang implements displaying this path, they obviously should work on to improve it.

Otherwise, points like the one below will only confuse the programmer, add unnecessary garbage to output and make the report less comprehensible:

The figure above shows "point No. 4". There is a bug somewhere below it. I understand that it occurs only if the condition is false - this is what Clang informs me about. But what for to display this information? Isn't it clear enough by itself that if the condition is true, the function will be terminated and no error will occur? It's just meaningless, unnecessary information. And there's quite a lot of such. This mechanism surely can and should be improved.

However, I want to give credit to Clang's developers. Displaying such a path does help to figure out the reason behind an error pretty often, especially when more than one function is involved. And Clang's developers have obviously implemented display of this path to reach an error much better than in Visual Studio 2013's static analyzer. In the latter, you can often see half of a function of 500 lines highlighted - and you just can't understand what the use of this highlighting is.

Severity of the errors detected

Analysis of PVS-Studio is a good example of how thankless it is trying to show the usefulness of static analysis on a working and well tested project. Well, I could actually make excuses for all the errors found by Clang by saying that:

this code is not used currently;
this code is used quite rarely or is used for error handling;
this is a bug indeed but it won't cause any severe consequences (fixing it won't affect the results of a huge amount of regression tests in any way).

Making such excuses will allow me to keep pretending that I never make serious errors and proudly tell everyone that Clang is only good for novice programmers.

But I won't do that! Clang having found no critical errors does not indicate at all that it is bad at analysis. Absence of such defects is the result of a vast amount of work on testing our tool through different methods:

internal unit tests;
regression tests by diagnostics (marked files);
testing on sets of *.i files containing various C++ constructs and extensions;
regression testing on 90 open-source projects;
and, of course, static analysis by PVS-Studio.

With such deep defense, you cannot expect Clang to find 20 null pointer dereferencing errors and 10 divisions by 0. But do think about it for a while. Even in a thoroughly tested project, Clang has managed to catch a few bugs. It means that using static analysis regularly can help you avoid plenty of troubles. It's better to fix a bug when found by Clang than receiving an *.i file PVS-Studio crashes at from a user.

We have made our conclusions, of course. Right now, my colleague is setting up Clang to launch on the server and send logs by email in case the analyzer finds anything.

False positives

The Clang analyzer has generated a total of 45 warnings. I don't feel like discussing the number of false positives; instead, let me just say that we must fix 12 fragments.

You see, "false positive" is quite a relative notion. Formally, the analyzer may be absolutely right thinking the code to be poorly written and suspicious. But it doesn't necessarily mean that it has found a real defect. Let me explain this idea by examples.

Here is a real false positive, to start with:

#define CreateBitMask(bitNum) ((v_uint64)(1) << bitNum)

unsigned GetBitCountForRepresntValueLoopMethod(
  v_int64 value, unsigned maxBitsCount)
{
  if (value == 0)
    return 0;
  if (value < 0)
    return maxBitsCount;
  v_uint64 uvalue = value;
  unsigned n = 0;
  int bit;
  for (bit = maxBitsCount - 1; bit >= 0; --bit)
  {
    if ((uvalue & CreateBitMask(bit)) != 0)
     // Clang: Within the expansion of the macro 'CreateBitMask':
     // The result of the '<<' expression is undefined
    {
      n = bit + 1;
      break;
    }
  ....
}

What I get from it is that the analyzer tells me about the shift operation potentially leading to undefined behavior. Clang seems to have mixed up things when trying to figure out the program execution logic or failed to correctly calculate the possible range of values for the maxBitsCount variable. I have very carefully investigated the GetBitCountForRepresntValueLoopMethod() function's call path and haven't found any situation when the 'maxBitsCount' variable could store a value too large. I do know quite a bit about shifts [3], so I'm sure there's no error here.

Self-confidence is good but not enough. That's why I added an assert() into the code:

....
for (bit = maxBitsCount - 1; bit >= 0; --bit)
{
  VivaAssert(bit >= 0 && bit < 64);
  if ((uvalue & CreateBitMask(bit)) != 0)
....

This assert() has not been triggered by any of the tests - which proves that what we were discussing above is a true false positive from Clang.

A nice consequence of adding assert() was that Clang would no longer generate that warning. It relies on assert() macros to find out possible ranges of variable values.

There are just a few real false positives like that. Much more common are warnings like this:

static bool G807_IsException1(const Ptree *p)
{
  ....
    if (kind == ntArrayExpr) {
      p = First(p);
      kind = p->What();
        // Clang: Value stored to 'kind' is never read
  ....

The "kind = p->What();" assignment is not used anymore. It was used in the past but became unnecessary due to some modifications. So the analyzer is correct. The line is excessive and should be removed even if to make the code clearer for the programmer who will be maintaining it in future.

Another example:

template<> template<>
void object::test<11>() {
  ....
  // Null nullWalker won't be used in tests.
  VivaCore::VivaWalker *nullWalker = 0;
  left.m_simpleType = ST_INT;
  left.SetCountOfUsedBits(32);
  left.m_creationHistory = TYPE_FROM_VALUE;
  right.m_simpleType = ST_INT;
  right.SetCountOfUsedBits(11);
  right.m_creationHistory = TYPE_FROM_EXPRESSION;
  result &= ApplyRuleN1(*nullWalker, left, right, false);
    // Clang: Forming reference to null pointer
  ....
}

A null pointer is dereferenced in the unit test. Yes, doing so is a bad and ugly practice. But a very tempting one. You see, preparing an instance of the VivaWalker class is very difficult and in this particular case the reference to the object is not used in any way.

Both examples show a working code. But I don't call them false positives - they are slight defects that should be eliminated. On the other hand, I wouldn't refer these warnings to the "detected errors" section either. This is why I'm saying that a false positive is a relative notion.

Detected errors

Finally, we have reached the section where I will show you interesting code fragments Clang has found in PVS-Studio.

These errors are not critical for program execution. It's not that I'm trying to excuse myself; I'm simply telling you the truth. After fixing all the warnings, regression tests did not detect any change of PVS-Studio's behavior.

But after all, we are speaking of genuine errors and it's great that Clang has managed to find them. I hope it will catch more serious mistakes in fresh PVS-Studio code when we start using it regularly.

Using two uninitialized variables

The corresponding code fragment is large and complex, so I won't cite it in full here. Instead, I've made an artificial sample to reflect what the error is all about.

int A, B;
bool getA, getB;
Get(A, getA, B, getB);
int TmpA = A; // Clang: Assigned value is garbage or undefined
int TmpB = B; // Clang: Assigned value is garbage or undefined
if (getA)
  Use(TmpA);
if (getB)
  Use(TmpB);

The Get() function can initialize the variables A and B. Whether or not it has done so is marked in the variables getA and getB.

Regardless of whether or not the variables A and B are initialized, their values are copied into TmpA and TmpB correspondingly. It is here that two uninitialized variables are used.

Why am I saying this error is not critical? You see, copying an uninitialized variable of the 'int' type doesn't cause any trouble in practice. Formally, as far as I understand, undefined behavior occurs. In practice, however, just some garbage will be copied. After that, these garbage variables are never used.

I rewrote the code in the following way:

if (getA)
{
  int TmpA = A;
  Use(TmpA);
}
if (getB)
{
  int TmpB = B;
  Use(TmpB);
}

Uninitialized pointers

Let's take a look at a call of the GetPtreePos() function. It receives references to uninitialized pointers.

SourceLocation Parser::GetLocation(const Ptree* ptree)
{
  const char *begin, *end;
  GetPtreePos(ptree, begin, end);
    return GetSourceLocation(*this, begin);
}

This is incorrect. The GetPtreePos() function assumes that the pointers will be initialized by the nullptr value. This is how it is implemented:

void GetPtreePos(const Ptree *p, const char *&begin, const char *&end)
{
  while (p != nullptr)
  {
    if (p->IsLeaf())
    {
      const char *pos = p->GetLeafPosition();
      if (....)
      {
        if (begin == nullptr) {
            // Clang: The left operand of '==' is a garbage value
          begin = pos;
        } else {
          begin = min(begin, pos);
        }
        end = max(end, pos);
      }
      return;
    }
    GetPtreePos(p->Car(), begin, end);
    p = p->Cdr();
  }
}

What saves us from complete disgrace is having the Getlocation() function is called when a certain code parsing error occurs in the unit-test subsystem. Guess there has never been such an occasion.

Here is a nice example of how good static analysis is at complementing TDD [4].

Scary explicit type conversions

There are three similar functions with scary and incorrect type conversions. Here's one of them:

bool Environment::LookupType(
  CPointerDuplacateGuard &envGuard, const char* name,
  size_t len, Bind*& t, const Environment **ppRetEnv,
  bool includeFunctions) const
{
  VivaAssert(m_isValidEnvironment);
  //todo:
  Environment *eTmp = const_cast<Environment *>(this);
  Environment **ppRetEnvTmp = const_cast<Environment **>(ppRetEnv);
  bool r = eTmp->LookupType(envGuard, name, len, t,
                            ppRetEnvTmp, includeFunctions);
  ppRetEnv = const_cast<const Environment **>(ppRetEnvTmp);
    // Clang: Value stored to 'ppRetEnv' is never read
  return r;
}

Sodom and Gomorrah. We tried to remove const-ness and then return the resulting value. But actually, it is just that the local variable ppRetEnv is changed in the line "ppRetEnv = const_cast....".

Now I will explain you where this ugliness stems from and how it affects the program execution.

The PVS-Studio analyzer is based on the OpenC++ library, where the keyword 'const' has almost never been used. You could at any instant change whatever and wherever you like by using pointers to non-constant objects. PVS-Studio has inherited this vice.

We tried to fight it but our victory has never been complete. You add const in one place, so you have to add it in another one, then another one, and so on. After that, you discover that in certain cases you need to change something through a pointer and have to split the function into several parts or carry out even more global refactoring.

The last heroic attempt to add const everywhere we need was undertaken by one of our idealist colleagues and took him one week just to end up with a partial fail. It became clear that we would need to greatly change the code and modify some data storage structures. The quest of bringing light into the kingdom of darkness was never completed. We added a few stubs like the function above to make our code compilable.

What does this error affect in the code? Sounds strange, but it doesn't seem to affect anything. None of all the unit and regression tests have revealed any changes in PVS-Studio's behavior after the fixes. Looks like the value returned in "ppRetEnv" is not much needed for work.

Using a potentially uninitialized variable

v_uint64 v; // Clang: 'v' declared without an initial value
verify(GetEscape(p, len - 3, v, notation, &p));
retValue <<= 8;
retValue |= v; // Clang: Assigned value is garbage or undefined

The GetEscape() function may terminate incorrectly, which will lead to the 'v' variable remaining uninitialized. The return result of the GetEscape() function is for some strange reason checked by the verify() macro. No one knows why.

The error has remained unnoticed until now due to the following reason. The GetEscape() function fails to initialize the variable only if the PVS-Studio analyzer works with an incorrect program text. Correct text always contains correct ESC-sequences and the variable is always initialized.

Wondering myself how it could work

Ptree *varDecl = bind->GetDecl();
if (varDecl != nullptr)
{
  if (varDecl->m_wiseType.IsIntegerVirtualValue())
    varRanges.push_back(....);
  else if (varDecl->m_wiseType.IsPointerVirtualValue())
    varRanges.push_back(....);
  else
    varRanges.push_back(nullptr);
}
rangeTypes.push_back(varDecl->m_wiseType.m_simpleType);
  // Clang: Dereference of null pointer

The varDecl pointer can be equal to nullptr. However, the last line is always executed, so null pointer dereferencing may happen: varDecl->m_wiseType.m_simpleType.

Why we never saw a crash at this code is a great mystery for me. My only guess is that we never get here when the object doesn't store a pointer to a variable declarator. But we shouldn't rely on that anyway.

Clang has found a very serious bug that would have surely revealed itself sooner or later.

Amazing, but we never saw crashes at these places too

One more surprising code fragment. It seems that the combination of certain factors that could lead to null pointer dereferencing is extremely unlikely here. At least, we haven't noticed a crash since the time we wrote this function - and that's one year and a half. Miracle, isn't it?

void ApplyRuleG_657(VivaWalker &walker,
  const BindFunctionName *bind,
  const IntegerVirtualValueArray *pReturnIntegerVirtualValues,
  const PointerVirtualValueArray *pReturnPointerVirtualValues,
  const Ptree *body, const Ptree *bodySrc,
  const Environment *env)
{
  if (body == nullptr || bodySrc == nullptr)
  {
    VivaAssert(false);
    return;
  }

  if (bind == nullptr)
    return;

  if (pReturnIntegerVirtualValues == nullptr &&
      pReturnPointerVirtualValues == nullptr)
    return;

  ....

  size_t integerValueCount = pReturnIntegerVirtualValues->size();
  // Clang: Called C++ object pointer is null

The pReturnIntegerVirtualValues pointer might well be equal to nullptr.

It may seem at first that the error is in the condition and we should use the "||" operator:

if (pReturnIntegerVirtualValues == nullptr &&
    pReturnPointerVirtualValues == nullptr)

But it's not true. The condition is correct; it's just that the pointer must be checked for null before dereferencing it. If it is null, 0 should be assigned to the integerValueCount variable. This is the correct code:

size_t integerValueCount =
  pReturnIntegerVirtualValues != nullptr ?
    pReturnIntegerVirtualValues->size() : 0;

Amazing. So many tests, run-through of 90 open-source projects, plus plenty of other projects checked during the year. And still there is a bug living in the code. I bet it would have revealed itself one day on the code of some important potential customer of ours.

Blessed are static analyzers! Blessed is Clang!

Miscellaneous

The analyzer has revealed a few other errors that should be fixed. It's quite difficult to describe them, and I don't feel like making artificial samples. Besides, there are a couple of warnings which are absolutely correct yet useless. We had to turn analysis off in those places.

For instance, Clang was worrying about uninitialized variables when using the RunPVSBatchFileMode() function. But the point is that we simply did not implement batch launch for Linux, so we made a stub there. And I don't think we will do it in the nearest future.

Conclusions

Use static analyzers in your work.

I believe PVS-Studio's core to be highly tested. Nevertheless, the Clang static analyzer has found 12 genuine bugs. Other warnings are not errors but they point out smelly code, so I fixed all those fragments.

The errors that we've found could have revealed themselves in a very inappropriate time. Besides, I suspect this analyzer could have helped us catch a number of errors that were hunted down with tests - and running the basic regression tests takes about 2 hours. If we could find some of the errors earlier, that would be just great.

So here is the article advertising Clang. Well, it does deserve it.

But don't think other analyzers are useless. Personally I, for instance, very much like the Cppcheck analyzer. It's very easy to use and provides pretty clear diagnostics. It just doesn't happen to have found a bunch of bugs in PVS-Studio like Clang, so I cannot write a similar complimentary article about it.

And, of course, I recommend you to try our analyzer PVS-Studio in your work. It is highly useful for those working with Visual C++ [5]. Especially worth your attention is the automatic incremental analysis mode, which runs after each successful file compilation in case they were modified.