Andrey Karpov

Sep 24 2020

Tags:

#Cpp #StaticAnalysis

Why it is important to apply static analysis for open libraries that you add to your project

Sep 24 2020

Author: Andrey Karpov

Collection "Awesome header-only C++ libraries"
Why analyze libraries
How to analyze libraries
What to use
Conclusion

Modern applications are built from third-party libraries like a wall from bricks. Their usage is the only option to complete the project in a reasonable time, spending a sensible budget, so it's a usual practice. However, taking all the bricks indiscriminately may not be such a good idea. If there are several options, it is useful to take time to analyze open libraries in order to choose the best one.

0762_check_open_source_libraries/image1.png

Collection "Awesome header-only C++ libraries"

The story of this article began with the release of the Cppcast podcast "Cross Platform Mobile Telephony". From it, I learned about the existence of the "awesome-hpp" list, which lists a large number of open C++ libraries consisting only of header files.

I was interested in this list for two reasons. First, it is an opportunity to extend the testing database for our PVS-Studio analyzer on modern code. Many projects are written in C++11, C++14, and C++17. Secondly, it might result in an article about the check of these projects.

The projects are small, so there are few errors in each one individually. In addition, there are few warnings, because some errors can only be detected if template classes or functions are instantiated in user's code. As long as these classes and functions aren't used, it is often impossible to figure out whether there is an error or not. Nevertheless, there were quite a lot of errors in total and I'll write about them in the next article. As for this article, it's not about errors, but about a caveat.

Why analyze libraries

By using third-party libraries, you implicitly trust them to do some of the work and calculations. Nevertheless, it might be dangerous because sometimes programmers choose a library without considering the fact that not only their code, but the code of libraries might contain errors as well. As a result, there are non-obvious, incomprehensible errors that may appear in the most unexpected way.

The code of well-known open libraries is well-debugged, and the probability of encountering an error there is much less than in similar code written independently. The problem is that not all libraries are widely used and debugged. And here comes the question of evaluating their quality.

To make it clearer, let's look at an example. Let's take the JSONCONS library as an example.

JSONCONS is a C++, header-only library for constructing JSON and JSON-like data formats such as CBOR.

A specific library for specific tasks. It may work well in general, and you will never find any errors in it. But don't even think about using this overloaded <<= operator.

static constexpr uint64_t basic_type_bits = sizeof(uint64_t) * 8;
....
uint64_t* data() 
{
  return is_dynamic() ? dynamic_stor_.data_ : short_stor_.values_;
}
....
basic_bigint& operator<<=( uint64_t k )
{
  size_type q = (size_type)(k / basic_type_bits);
  if ( q ) // Increase common_stor_.length_ by q:
  {
    resize(length() + q);
    for (size_type i = length(); i-- > 0; )
      data()[i] = ( i < q ? 0 : data()[i - q]);
    k %= basic_type_bits;
  }
  if ( k )  // 0 < k < basic_type_bits:
  {
    uint64_t k1 = basic_type_bits - k;
    uint64_t mask = (1 << k) - 1;             // <=
    resize( length() + 1 );
    for (size_type i = length(); i-- > 0; )
    {
      data()[i] <<= k;
      if ( i > 0 )
        data()[i] |= (data()[i-1] >> k1) & mask;
      }
  }
  reduce();
  return *this;
}

PVS-Studio analyzer warning: V629 Consider inspecting the '1 << k' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. bigint.hpp 744

If I'm right, the function works with large numbers that are stored as an array of 64-bit elements. To work with certain bits, you need to create a 64-bit mask:

uint64_t mask = (1 << k) - 1;

The only thing is that the mask is formed incorrectly. Since the numeric literal 1 is of inttype, if we shift it by more than 31 bits, we get undefined behavior.

From the standard:

shift-expression << additive-expression

....

2. The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 * 2^E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1*2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

The value of the mask variable can be anything. Yes, I know, theoretically, anything can happen because of UB. But in practice, most likely, we are talking about an incorrect result of the expression.

So, we have a function here that can't be used. Rather, it will only work for some special cases of the input argument value. This is a potential trap that a programmer can fall into. The program can run and pass various tests, and then suddenly crash on other input files.

You can also see the same error in operator>>=.

Now I'm going to ask you something rhetorically. Should I trust this library?

Maybe I should. After all, there are errors in all projects. However, it is worth considering: if these errors exist, are there any others that can lead to perky data corruption? Isn't it better to give preference to a more popular/tested library if there are several of them?

An unconvincing example? Okay, let's try another one. Let's take the Universal mathematical library. It is expected that the library provides the ability to operate with vectors. For example, multiply and divide a vector by a scalar value. All right, let's see how these operations are implemented. Multiplication:

template<typename Scalar>
vector<Scalar> operator*(double scalar, const vector<Scalar>& v) {
  vector<Scalar> scaledVector(v);
  scaledVector *= scalar;
  return v;
}

PVS-Studio analyzer warning: V1001 The 'scaledVector' variable is assigned but is not used by the end of the function. vector.hpp 124

Because of a typo, the original vector is returned, not the new scaledVector container. The same error occurs in the division operator. Facepalm.

Again, these errors don't mean anything separately. Although, this is a hint that this library isn't used much and there is highly likely that there are other serious undetected errors in it.

Conclusion. If several libraries provide the same functions, then you should perform preliminary analysis of their quality and choose the most tested and reliable one.

How to analyze libraries

Okay, we want to figure out the quality of library code, but how do we do that? It's not easy to do this. One doesn't simply review the code. Or rather, you can look through it, but it will give little information. Moreover, such a review is unlikely to help you estimate the error density in the project.

Let's go back to the previously mentioned Universal mathematical library. Try to find an error in the code of this function. Seeing the comment next to it, I can't help but cite it for you :).

// subtract module using SUBTRACTOR: CURRENTLY BROKEN FOR UNKNOWN REASON

0762_check_open_source_libraries/image2.png

template<size_t fbits, size_t abits>
void module_subtract_BROKEN(const value<fbits>& lhs, const value<fbits>& rhs,
                            value<abits + 1>& result) {
  if (lhs.isinf() || rhs.isinf()) {
    result.setinf();
    return;
  }
  int lhs_scale = lhs.scale(),
      rhs_scale = rhs.scale(),
      scale_of_result = std::max(lhs_scale, rhs_scale);

  // align the fractions
  bitblock<abits> r1 = lhs.template nshift<abits>(lhs_scale-scale_of_result+3);
  bitblock<abits> r2 = rhs.template nshift<abits>(rhs_scale-scale_of_result+3);
  bool r1_sign = lhs.sign(), r2_sign = rhs.sign();

  if (r1_sign) r1 = twos_complement(r1);
  if (r1_sign) r2 = twos_complement(r2);

  if (_trace_value_sub) {
    std::cout << (r1_sign ? "sign -1" : "sign  1") << " scale "
      << std::setw(3) << scale_of_result << " r1       " << r1 << std::endl;
    std::cout << (r2_sign ? "sign -1" : "sign  1") << " scale "
      << std::setw(3) << scale_of_result << " r2       " << r2 << std::endl;
  }

  bitblock<abits + 1> difference;
  const bool borrow = subtract_unsigned(r1, r2, difference);

  if (_trace_value_sub) std::cout << (r1_sign ? "sign -1" : "sign  1")
    << " borrow" << std::setw(3) << (borrow ? 1 : 0) << " diff    "
    << difference << std::endl;

  long shift = 0;
  if (borrow) {   // we have a negative value result
    difference = twos_complement(difference);
  }
  // find hidden bit
  for (int i = abits - 1; i >= 0 && difference[i]; i--) {
    shift++;
  }
  assert(shift >= -1);

  if (shift >= long(abits)) {            // we have actual 0 
    difference.reset();
    result.set(false, 0, difference, true, false, false);
    return;
  }

  scale_of_result -= shift;
  const int hpos = abits - 1 - shift;         // position of the hidden bit
  difference <<= abits - hpos + 1;
  if (_trace_value_sub) std::cout << (borrow ? "sign -1" : "sign  1")
    << " scale " << std::setw(3) << scale_of_result << " result  "
    << difference << std::endl;
  result.set(borrow, scale_of_result, difference, false, false, false);
}

I am sure that even though I gave you a hint that there is an error in this code, it is not easy to find it.

If you didn't find it, here it is. PVS-Studio warning: V581 The conditional expressions of the 'if' statements situated alongside each other are identical. Check lines: 789, 790. value.hpp 790

if (r1_sign) r1 = twos_complement(r1);
if (r1_sign) r2 = twos_complement(r2);

Classic typo. In the second condition, the r2_sign variable must be checked.

Like I say, forget about the "manual" code review. Yes, this way is possible, but unnecessarily time-consuming.

What do I suggest? A very simple way. Use static code analysis.

Check the libraries you are going to use. Start looking at the reports and everything will become clear quickly enough.

You don't even need very thorough analysis and you don't need to filter false positives. Just go through the report and review the warnings. Be patient about false positives due to default settings and focus on errors.

However, false positives can also be taken into account indirectly. The more of them, the untidier the code is. In other words, there are a lot of tricks in the code that confuse the analyzer. They confuse the people who maintain the project and, as a result, negatively affect its quality.

Note. Don't forget about the size of projects. In a large project, there will always be more errors. But the number of errors is not the same as the error density. Keep this in mind when taking projects of different sizes and make adjustments.

What to use

There are many tools for static code analysis. I obviously suggest using the PVS-Studio analyzer. It is great for both one-time code quality assessment and regular error detection and correction.

You can check the project code in C, C++, C#, and Java. The product is proprietary. However, a free trial license will be more than enough to evaluate the quality of several open libraries.

I also remind you that there are several options for free licensing of the analyzer for:

students;
developers of open source projects;
developers of closed projects (you need to add special comments to the code);
Microsoft MVP.

Conclusion

The methodology of static code analysis is still undeservedly underestimated by many programmers. A possible reason for this is the experience of working with simple noisy tools of the "linter" class, which perform very simple and, unfortunately, often useless checks.

For those who are not sure whether to try implementing a static analyzer in the development process, see the following two posts:

Thank you for your attention, and I wish you fewer bugs both in your code and in the code of the libraries you use :).

#Cpp #StaticAnalysis