Static analysis should be used regularly

20.08.2012 Andrey Karpov

We have a practice of occasionally re-analyzing projects we have already checked with PVS-Studio. There are several reasons why we do so. For example, we want to know if we have managed to eliminate false positives for certain diagnostics. But the most interesting thing is to see how new diagnostic rules work and what errors they can find. It is very interesting to watch the tool catch more and more new defects in a project that seems to be cleaned out already. The next project we have re-checked is Clang.

Clang is a project of great interest to us. First, because it's very high-quality. It means that finding a new error in it is a large achievement. Second, because it shows very clear various faults in PVS-Studio which cause false positives.

Unfortunately, more than a month has passed since the recheck and writing of this article. My vacation was the reason. It's probable that the suspicious code described here is already fixed by the moment of publication of this post. But it's OK. The main thing is that I can remind the readers that static analysis is a tool to be used regularly, not from time to time.

Static analysis should be applied regularly, as:

  • New code is added into a project. If you don't check it at once, many errors will take much time to fix. They will be reported by the testing department or your customers.
  • Tools learn to catch more and more error patterns.

All this sounds very simple and even trivial. Unfortunately, developers are lazy to integrate static analysis into the development process. We have to nudge them to make this step again and again.

The previous check of the Clang project was carried out about one year ago. We have added new diagnostic rules during this time that helped us to detect new suspicious code fragments. They are not numerous, though. It's no wonder because the Clang project contains a static analyzer itself and is developed by highly-skilled programmers. It's just strange that we manage to find something at all.

Let's see what interesting issues we have managed to find in the code. Suspicious fragments are mostly related to shift operations.

int64_t DataExtractor::getSLEB128(....) const {
  int64_t result = 0;
  ...
  // Sign bit of byte is 2nd high order bit (0x40)
  if (shift < 64 && (byte & 0x40))
    result |= -(1 << shift);
  ...
}

PVS-Studio: V629 Consider inspecting the '1 << shift' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. dataextractor.cpp 171

Judging by the "shift < 64" check, value 1 can be shifted to the left by [0..63] bits. But this code may lead to undefined behavior. See the article "Wade not in unknown waters. Part three" to learn more about the reasons why undefined behavior may occur here. What is tricky about such defects is that your program might pretend to work correctly for a long time. Faults occur when you switch to another compiler version, start using a different optimization switch and after code refactoring.

The code will become safe if number 1 is represented by a 64-bit unsigned data type. In this case you can safely shift it by 63 bits. This is the safe code:

result |= -(1ui64 << shift);

Unfortunately, I'm not sure what to do with the minus sign.

Consider another sample containing a strange shift operation:

void EmitVBR64(uint64_t Val, unsigned NumBits) {
  if ((uint32_t)Val == Val)
    return EmitVBR((uint32_t)Val, NumBits);

  uint64_t Threshold = 1U << (NumBits-1);
  ...
}

PVS-Studio: V629 Consider inspecting the '1U << (NumBits - 1)' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. bitstreamwriter.h 173

If the 'NumBits' argument can be larger than 32, the function will work incorrectly. Like in the previous example, undefined behavior will occur when '1U' is shifted by many bits. In practice, undefined behavior will most probably manifest itself through putting meaningless values into the 'Threshold' variable.

This is the safe code:

uint64_t Threshold = 1UI64 << (NumBits-1);

The samples described above will cause errors only if there is a shift by a large number of bits. But there are fragments that cause undefined behavior all the time. For example, such is a negative number shift.

int find_next(unsigned Prev) const {
  ...
  // Mask off previous bits.
  Copy &= ~0L << BitPos;
  ...
}

PVS-Studio: V610 Undefined behavior. Check the shift operator '<<. The left operand '~0L' is negative. bitvector.h 175

This code is not safe. The Clang project is built for various platforms. That's why you need to be careful when using such constructs. It's difficult to predict consequences of negative number shifts on certain platforms.

There are other potentially dangerous shift operations as well. They are all alike, so we won't consider them in detail. Let me just list them according to their location in the code:

  • V610 Undefined behavior. Check the shift operator '<<=. The left operand 'Val' is negative. pointerintpair.h 139
  • V610 Undefined behavior. Check the shift operator '<<. The left operand '~0L' is negative. bitvector.h 454
  • V610 Undefined behavior. Check the shift operator '<<. The left operand '~0L' is negative. sparsebitvector.h 161
  • V610 Undefined behavior. Check the shift operator '<<=. The left operand 'Val' is negative. pointerintpair.h 144
  • V610 Undefined behavior. Check the shift operator '<<=. The left operand 'Val' is negative. densemapinfo.h 35
  • V610 Undefined behavior. Check the shift operator '<<=. The left operand 'Val' is negative. densemapinfo.h 40
  • V629 Consider inspecting the '1U << (NumBits - 1)' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. bitstreamreader.h 362
  • V629 Consider inspecting the 'Bit->getValue() << i' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. record.cpp 248

Besides strange shifts we have found several strange loops. The point is that they iterate only once.

bool ObjCARCOpt::VisitBottomUp(....) {
  ...
  for (BBState::edge_iterator SI(MyStates.succ_begin()),
       SE(MyStates.succ_end()); SI != SE; ++SI)
  {
    const BasicBlock *Succ = *SI;
    DenseMap<const BasicBlock *, BBState>::iterator I =
      BBStates.find(Succ);
    assert(I != BBStates.end());
    MyStates.InitFromSucc(I->second);
    ++SI;
    for (; SI != SE; ++SI) {
      Succ = *SI;
      I = BBStates.find(Succ);
      assert(I != BBStates.end());
      MyStates.MergeSucc(I->second);
    }
    break;
  }
  ...
}

PVS-Studio: V612 An unconditional 'break' within a loop. objcarc.cpp 2763

Note the last 'break' operator. There is no condition before it and it always finishes the loop. So, the loop iterates only once.

These are similar strange code fragments:

  • V612 An unconditional 'break' within a loop. objcarc.cpp 2948
  • V612 An unconditional 'break' within a loop. undefinedassignmentchecker.cpp 75
  • V612 An unconditional 'break' within a loop. bugreporter.cpp 1095

Conclusion

The V610, V612, V629 diagnostics are new and therefore allowed us to find some new interesting bugs. If you have checked your project a year ago, it doesn't matter. It doesn't matter at all. Because you have written a new unchecked code. The analyzer has also got new diagnostic capabilities. They actually continue appearing each month. Start using static analysis regularly and you will spend quite fewer efforts searching for and eliminating a great many of errors.