Andrey Karpov

May 23 2011

Tags:

#Cpp

PVS-Studio vs Chromium

May 23 2011

Author: Andrey Karpov

Good has won this time. To be more exact, source codes of the Chromium project have won. Chromium is one of the best projects we have checked with PVS-Studio.

Chromium is an open-source web-browser developed by Google and intended to provide users with fast and safe Internet access. Chromium serves as the base for the Google Chrome browser. Moreover, Chromium is a preliminary version of Google Chrome as well as some other alternative web-browsers.

From the programming viewpoint, Chromium is a solution consisting of 473 projects. The general size of the source C/C++ code is about 460 Mbytes and the number of lines is difficult to count.

These 460 Mbytes include a lot of various libraries. If you exclude them, you will have about 155 Mbytes. It is much less but still a lot of lines. Moreover, everything is relative, you know. Many of these libraries were created by the Chromium developers within the task of creating Chromium itself. Although such libraries live by themselves, still we may refer them to the browser.

Chromium had become the most quality and large project I have studied during testing of PVS-Studio. While handling the Chromium project it was not actually clear to us what was checking what: we have found and fixed several errors in PVS-Studio related to C++ file analysis and support of a specific project's structure.

Many aspects and methods used in Chromium show the quality of its source code. For instance, most programmers determine the number of items in an array using the following construct:

int XX[] = { 1, 2, 3, 4 };
size_t N = sizeof(XX) / sizeof(XX[0]);

Usually it is arranged as a macro of this kind:

#define count_of(arg) (sizeof(arg) / sizeof(arg[0]))

This is a quite efficient and useful macro. To be honest, I have always used this very macro myself. However, it might lead to an error because you may accidentally pass a simple pointer to it and it will not mind. Let me explain this by the following example:

void Test(int C[3])
{
  int A[3];
  int *B = Foo();
  size_t x = count_of(A); // Ok
  x = count_of(B); // Error
  x = count_of(C); // Error
}

The count_of(A) construct works correctly and returns the number of items in the A array which is equal to three here.

But if you apply by accident count_of() to a pointer, the result will be a meaningless value. The issue is that the macro will not produce any warning for the programmer about a strange construct of the count_of(B) sort. This situation seems farfetched and artificial but I had encountered it in various applications. For example, consider this code from the Miranda IM project:

#define SIZEOF(X) (sizeof(X)/sizeof(X[0]))
int Cache_GetLineText(..., LPTSTR text, int text_size, ...)
{
  ...
  tmi.printDateTime(pdnce->hTimeZone, _T("t"), text, SIZEOF(text), 0);
  ...
}

So, such errors may well exist in your code and you'd better have something to protect yourself against them. It is even easier to make a mistake when trying to calculate the size of an array passed as an argument:

void Test(int C[3])
{
  x = count_of(C); // Error
}

According to the C++ standard, the 'C' variable is a simple pointer, not an array. As a result, you may often see in programs that only a part of the array passed is processed.

Since we have started speaking of such errors, let me tell you about a method that will help you find the size of the array passed. You should pass it by the reference:

void Test(int (&C)[3])
{
  x = count_of(C); // Ok
}

Now the result of the count_of(C) expression is value 3.

Let's return to Chromium. It uses a macro that allows you to avoid the above described errors. This is how it is implemented:

template <typename T, size_t N>
char (&ArraySizeHelper(T (&array)[N]))[N];
#define arraysize(array) (sizeof(ArraySizeHelper(array)))

The idea of this magic spell is the following: the template function ArraySizeHelper receives an array of a random type with the N length. The function returns the reference to the array of the N length consisting of 'char' items. There is no implementation for this function because we do not need it. For the sizeof() operator it is quite enough just to define the ArraySizeHelper function. The 'arraysize' macro calculates the size of the array of bytes returned by the ArraySizeHelper function. This size is the number of items in the array whose length we want to calculate.

If you have gone crazy because of all this, just take my word for it - it works. And it works much better than the 'count_of()' macro we have discussed above. Since the ArraySizeHelper function takes an array by the reference, you cannot pass a simple pointer to it. Let's write a test code:

template <typename T, size_t N>
char (&ArraySizeHelper(T (&array)[N]))[N];
#define arraysize(array) (sizeof(ArraySizeHelper(array)))

void Test(int C[3])
{
  int A[3];
  int *B = Foo();
  size_t x = arraysize(A); // Ok
  x = arraysize(B); // Compilation error
  x = arraysize(C); // Compilation error
}

The incorrect code simply will not be compiled. I think it's cool when you can prevent a potential error already at the compilation stage. This is a nice sample reflecting the quality of this programming approach. My respect goes to Google developers.

Let me give you one more sample which is of a different sort yet it shows the quality of the code as well.

if (!file_util::Delete(db_name, false) &&
    !file_util::Delete(db_name, false)) {
  // Try to delete twice. If we can't, fail.
  LOG(ERROR) << "unable to delete old TopSites file";
  return false;
}

Many programmers might find this code strange. What is the sense in trying to remove a file twice? There is a sense. The one who wrote it has reached Enlightenment and comprehended the essence of software existence. A file can be definitely removed or cannot be removed at all only in textbooks and in some abstract world. In the real system it often happens that a file cannot be removed right now and can be removed an instance later. There may be many reasons for that: antivirus software, viruses, version control systems and whatever. Programmers often do not think of such cases. They believe that when you cannot remove a file you cannot remove it at all. But if you want to make everything well and avoid littering in directories, you should take these extraneous factors into account. I encountered quite the same situation when a file would not get removed once in 1000 runs. The solution was also the same - I only placed Sleep(0) in the middle just in case.

Well, and what about the check by PVS-Studio? Chromium's code is perhaps the most quality code I've ever seen. This is confirmed by the low density of errors we've managed to find. If you take their quantity in general, there are certainly plenty of them. But if you divide the number of errors by the amount of code, it turns out that there are almost no errors. What are these errors? They are the most ordinary ones. Here are several samples:

V512 A call of the 'memset' function will lead to underflow of the buffer '(exploded)'. platform time_win.cc 116

void NaCl::Time::Explode(bool is_local, Exploded* exploded) const {
  ....
  ZeroMemory(exploded, sizeof(exploded));
  ....
}

Everybody makes misprints. In this case, an asterisk is missing. It must be sizeof(*exploded).

V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '-' operator. views custom_frame_view.cc 400

static const int kClientEdgeThickness;
int height() const;
bool ShouldShowClientEdge() const;

void CustomFrameView::PaintMaximizedFrameBorder(gfx::Canvas* canvas) {
  ....
  int edge_height = titlebar_bottom->height() -
                    ShouldShowClientEdge() ? kClientEdgeThickness : 0;
  ....
}

The insidious operator "?:" has a lower priority than subtraction. There must be additional parentheses here:

int edge_height = titlebar_bottom->height() -
                  (ShouldShowClientEdge() ? kClientEdgeThickness : 0);

A meaningless check.

V547 Expression 'count < 0' is always false. Unsigned type value is never < 0. ncdecode_tablegen ncdecode_tablegen.c 197

static void CharAdvance(char** buffer, size_t* buffer_size, size_t count) {
  if (count < 0) {
    NaClFatal("Unable to advance buffer by count!");
  } else {
  ....
}

The "count < 0" condition is always false. The protection does not work and some buffer might get overflowed. By the way, this is an example of how static analyzers might be used to search for vulnerabilities. An intruder can quickly find code fragments that contain errors for further thorough investigation. Here is another code sample related to the safety issue:

V511 The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (salt)' expression. Common visitedlink_common.cc 84

void MD5Update(MD5Context* context, const void* buf, size_t len);

VisitedLinkCommon::Fingerprint
  VisitedLinkCommon::ComputeURLFingerprint(
  ....
 const uint8 salt[LINK_SALT_LENGTH])
{
  ....
  MD5Update(&ctx, salt, sizeof(salt));
  ....
}

The MD5Update() function will process as many bytes as the pointer occupies. This is a potential loophole in the data encryption system, isn't it? I do not know whether it implies any danger; however, from the viewpoint of intruders, this is a fragment for thorough analysis.

The correct code should look this way:

MD5Update(&ctx, salt, sizeof(salt[0]) * LINK_SALT_LENGTH);

Or this way:

VisitedLinkCommon::Fingerprint
  VisitedLinkCommon::ComputeURLFingerprint(
  ....
 const uint8 (&salt)[LINK_SALT_LENGTH])
{
  ....
  MD5Update(&ctx, salt, sizeof(salt));
  ....
}

One more sample with a misprint:

V501 There are identical sub-expressions 'host != buzz::XmlConstants::str_empty ()' to the left and to the right of the '&&' operator. chromoting_jingle_glue iq_request.cc 248

void JingleInfoRequest::OnResponse(const buzz::XmlElement* stanza) {
  ....
  std::string host = server->Attr(buzz::QN_JINGLE_INFO_HOST);
  std::string port_str = server->Attr(buzz::QN_JINGLE_INFO_UDP);
  if (host != buzz::STR_EMPTY && host != buzz::STR_EMPTY) {
  ....
}

The port_str variable must be actually checked as well:

if (host != buzz::STR_EMPTY && port_str != buzz::STR_EMPTY) {

A bit of classics:

V530 The return value of function 'empty' is required to be utilized. chrome_frame_npapi np_proxy_service.cc 293

bool NpProxyService::GetProxyValueJSONString(std::string* output) {
  DCHECK(output);
  output->empty();
  ....
}

It must be: output->clear();

And here is even the handling of a null pointer:

V522 Dereferencing of the null pointer 'plugin_instance' might take place. Check the logical condition. chrome_frame_npapi chrome_frame_npapi.cc 517

bool ChromeFrameNPAPI::Invoke(...)
{
  ChromeFrameNPAPI* plugin_instance =
    ChromeFrameInstanceFromNPObject(header);
  if (!plugin_instance && (plugin_instance->automation_client_.get()))
    return false;
  ....
}

One more example of a check that will never work:

V547 Expression 'current_idle_time < 0' is always false. Unsigned type value is never < 0. browser idle_win.cc 23

IdleState CalculateIdleState(unsigned int idle_threshold) {
  ....
  DWORD current_idle_time = 0;
  ....
  // Will go -ve if we have been idle for a long time (2gb seconds).
  if (current_idle_time < 0)
    current_idle_time = INT_MAX;
  ....
}

Well, we should stop here. I can continue but it's starting to get boring. Remember that all this only concerns the Chromium itself. But there are also tests with errors like this:

V554 Incorrect use of auto_ptr. The memory allocated with 'new []'will be cleaned using 'delete'. interactive_ui_tests accessibility_win_browsertest.cc 306

void AccessibleChecker::CheckAccessibleChildren(IAccessible* parent) {
  ....
  auto_ptr<VARIANT> child_array(new VARIANT[child_count]);
  ....
}

There are also plenty of libraries Chromium is actually based on, the total size of libraries being much larger than that of Chromium itself. They also have a lot of interesting fragments. It is clear that code containing errors might not be used anywhere, still they are the errors nonetheless. Consider one of the examples (the ICU library):

V547 Expression '* string != 0 || * string != '_'' is always true. Probably the '&&' operator should be used here. icui18n ucol_sit.cpp 242

U_CDECL_BEGIN static const char* U_CALLCONV
_processVariableTop(...)
{
  ....
  if(i == locElementCapacity && (*string != 0 || *string != '_')) {
    *status = U_BUFFER_OVERFLOW_ERROR;
  }
  ....
}

The (*string != 0 || *string != '_') expression is always true. Perhaps it must be: (*string == 0 || *string == '_').

Conclusion

PVS-Studio was defeated. Chromium's source code is one of the best we have ever analyzed. We have found almost nothing in Chromium. To be more exact, we have found a lot of errors and this article demonstrates only a few of them. But if we keep in mind that all these errors are spread throughout the source code with the size of 460 Mbytes, it turns out that there are almost no errors at all.

P.S.

I'm answering to the question: will we inform the Chromium developers of the errors we've found? No, we won't. It is a very large amount of work and we cannot afford doing it for free. Checking Chromium is far from checking Miranda IM or checking Ultimate Toolbox. This is a hard work, we have to study all of the messages and make a decision whether there is an error in every particular case. To do that, we must be knowledgeable about the project. We will this article to the Chromium developers, and should they find it interesting, they will be able to analyze the project themselves and study all the diagnostic messages. Yes, they will have to purchase PVS-Studio for this purpose. But any Google department can easily afford this.

#Cpp