Andrey Karpov

Apr 29 2014

Tags:

#Cpp #Knowledge #64bit

C++11 and 64-bit Issues

Apr 29 2014

Author: Andrey Karpov

The world of 64-bit errors
Magic numbers
Variadic functions
Incorrect shift operations
Disparity between virtual functions
Mixed arithmetic
Address arithmetic
Changing an array type and pointer packing
Serialization and data exchange
Overloaded functions
Type size checks
Conclusion
References

64-bit computers have been around for a long time already. Most applications have 64-bit versions that can benefit from a larger memory capacity and improved performance, thanks to the architectural capabilities of 64-bit processors. Developing a 64-bit application in C/C++ requires a great deal of attention from a programmer. There are a number of reasons for 32-bit code to fail to work properly when recompiled for the 64-bit platform. There are a lot of articles on this subject, so we will focus on another point. Let's find out if the new features introduced in C++11 have made 64-bit software programmers' life any better, or easier.

Note. The article was originally published in Software Developer's Journal (April 25, 2014) and is published here by the editors' permission.

The world of 64-bit errors

There are many traps a 64-bit C/C++ programmer can fall into. Many articles were published on this subject, so we will not dwell on this. If you are not familiar with specific aspects of 64-bit software development, or want to refresh your knowledge about it, consider the following resources:

Nevertheless, time runs on and has eventually brought us an updated and improved version of the C++ language named C++11. Most of the innovations described in the C++11 language standard are currently supported by modern compilers. Let's find out if these innovations can help programmers avoid 64-bit errors.

The article is organized in the following way. I will give a brief description of a typical 64-bit issue, and offer ways to avoid it by means of the C++11 language. It should be noted that C++11 is not always helpful, so it is only careful programming that will protect you against making errors. The new standard will only provide additional aid, it will never be able to solve all of your troubles.

Magic numbers

Meaning, numbers like 4, 32, 0x7FFFFFFF, 0xFFFFFFFF (more). Programmers should never assume that the pointer size is always going to be 4 bytes, as it may result it in the following incorrect code:

int **array = (int **)malloc(n * 4);

The C++11 standard has nothing to offer to handle such an error. Magic numbers are evil, and should be avoided whenever possible, to prevent any errors related to them.

Note. True, malloc() is not from C++, it is from the good old C. It would be better to use the new operator, or the std::vector container here. But we won't get into that, since it has nothing to do with our subject, magic numbers.

However, C++11 can actually help you use fewer magic numbers in certain cases. Programmers sometimes use magic numbers because they are afraid (usually without reason) that the compiler will not optimize the code properly. In this case, one should use generalized constant expressions (constexpr).

The constexpr mechanism guarantees initialization of expressions during compilation. You can declare functions which will certainly be expanded into constants during compilation. For example:

constexpr int Formula(int a) {
  constexpr int tmp = a * 2;
  return tmp + 55;
}
int n = Formula(1);

The call of the Formula(1) function will turn into a number. The explanation is too short of course, so I recommend for you to check out the references at the end of the article, to learn more about "constexpr" and other innovations of C++11.

Variadic functions

Here I mean the issues that occur when the functions printf, scanf and the like, are used incorrectly (more). For example:

size_t value = ....;
printf("%u", value);

This code works properly in the 32-bit version of the program, but may print incorrect values when recompiled into the 64-bit version.

Variadic functions are vestiges of the C language. Their disadvantage is the absence of control over the types of actual arguments. The time had come to drop them completely in modern C++. After all, there are numbers of other string formatting methods. For example, you can replace printf with cout, and sprintf with boost::format or std::stringstream.

Things improved even more as the C++11 language appeared. It brought us variadic templates which allow one to implement a safe version of the printf function:

void printf(const char* s)
{
  while (s && *s) {
    if (*s=='%' && *++s!='%')
      throw runtime_error("invalid format: missing arguments");
    std::cout << *s++;
  }
}
template<typename T, typename... Args>
void printf(const char* s, T value, Args... args)
{
  while (s && *s) {
    if (*s=='%' && *++s!='%') {
      std::cout << value;
      return printf(++s, args...);
    }
    std::cout << *s++;
  }
}

This code simply "pulls out" the first argument which is not a format string, and then calls itself recursively. When there are no such arguments left, the first (simpler) version of the printf() method will be called.

The Args...defines what is called a "parameter pack". It's basically a sequence of 'type/value' pairs from which you can "peel off" arguments starting with the first. When printf() is called with one argument, the first definition (printf(const char*)) is chosen. When printf() is called with two or more arguments, the second definition (printf(const char*, T value, Args... args)) is chosen, with the first argument as s, the second as value, and the rest (if any) bundled into the 'args' parameter pack for the subsequent use. In the call

printf(++s, args...);

The 'args' parameter pack is expanded so that the next argument can now be selected as value. This carries on until args is empty (so that the first version of printf() is called).

Incorrect shift operations

The numerical literal 1 is of the int type. It means that it can't be shifted by more than 31 bits (more). Programmers often forget about this, and write incorrect code:

ptrdiff_t mask = 1 << bitNum;

If the bitNum value equals 40, for example, it will have unpredictable consequences, formally leading to undefined behavior (more).

What does C++11 have to offer to solve this issue? Unfortunately, nothing.

Disparity between virtual functions

Assume we have a virtual function declared in a base class:

int A(DWORD_PTR x);

And the following function in the descendant class:

int A(DWORD x);

In a 32-bit version, the types DWORD_PTR and DWORD coincide. But they turn into two different types in a 64-bit version (more). As a result, calling the A function from the base class will lead to different outputs in the 32-bit and 64-bit programs.

To avoid such errors, we can use the new keywords introduced in C++11.

Now we have the keyword override, which allows the programmer to explicitly manifest his intentions concerning function overriding. It is only correct to declare a function with the override keyword when there is a function to be overridden.

The code will fail to compile in the 64-bit mode and therefore the error will be prevented:

struct X
{
  virtual int A(DWORD_PTR) { return 1; }
};
struct Y : public X
{
  int A(DWORD x) override { return 2; }
};

Mixed arithmetic

This topic is pretty large and important, so I suggest that you study the corresponding section of the "64-bit Lessons ": Mixed arithmetic.

Let me just cite a couple of theses here:

Programmers tend to forget that the resulting value of a multiplication, or addition, of two variables of the 'int' type will be also 'int', which may cause an overflow, and it doesn't matter how this result is used after that.
It is unsafe to mix 32-bit and 64-bit data types, as the consequences may be unpleasant: incorrect conditions, infinite loops, etc.

A few simple examples of an overflow

char *p = new char[1024*1024*1024*5];

The programmer is trying to allocate 5 GBytes of memory, but the program will actually allocate much less because the "1024*1024*1024*5" expression is of the int type. It will result in an overflow, and the expression will evaluate to 1073741824 (1 GByte). After that, this value will be extended to the size_t type when being passed to the 'new' operator, but it just won't matter (it will be too late).

If you still haven't grasped the idea, here you are another example:

unsigned a = 1024, b = 1024, c = 1024, d = 5;
size_t n = a * b * c * d;

The expression's result is written into a variable of the 'size_t' type. It can store values larger than UINT_MAX. However, when multiplying 'unsigned' variables, an overflow will occur and the result will be incorrect.

Why do we refer to all these as 64-bit issues? The point is that you can't allocate an array larger than 2 GBytes in a 32-bit program. It means that you will simply never see any overflows there. But in 64-bit applications handling larger memory amounts, these errors will reveal themselves.

Now a couple of examples on comparison

size_t Count = BigValue;
for (unsigned Index = 0; Index < Count; ++Index)
{ ... }

In this fragment, an infinite loop will occur if Count > UINT_MAX. Suppose this code is used to iterate fewer times than UINT_MAX in the 32-bit version. But the 64-bit version can handle more data and therefore may need more iterations. Since the values of the Index variable lie inside the range [0..UINT_MAX], the "Index < Count" condition is always true, thus leading to an infinite loop.

One more example:

string str = .....;
unsigned n = str.find("ABC");
if (n != string::npos)

This code is incorrect. The find() function returns a value of the string::size_type type. It will work correctly in the 32-bit version, but let's see what will happen in the 64-bit one.

In the 64-bit program, string::size_type and unsigned do not coincide anymore. If the substring cannot be found, the find() function will return the value string::npos which equals 0xFFFFFFFFFFFFFFFFui64. This value is truncated to 0xFFFFFFFFu and is written into a 32-bit variable. The 0xFFFFFFFFu != 0xFFFFFFFFFFFFFFFFui64 expression is calculated, and it turns that the (n != string::npos) condition is always true!

Can C++11 help in any way here?

The answer is both yes, and no.

In some cases, the new keyword auto may be of use, but in some other cases, it will only confuse the programmer. So let's figure out when it can, and cannot, be used.

If you declare "auto a = .....", the type will be estimated automatically. It is very important that you don't get confused and don't write such an incorrect code as "auto n = 1024*1024*1024*5;".

Now, a few words about the auto keyword. Take a look at this example:

auto x = 7;

In this case, the 'x' variable will have the 'int' type, as it is the same type as that of the variable initializer. In general, we can write the following code:

auto x = expression;

The type of the 'x' variable will be the same as that of the value the expression evaluates to.

The 'auto' keyword is most useful to get the type of a variable from its initializer when you don't know the exact type of the expression, or it is too complex to write manually. Take a look at the following example:

template<class T> void printall(const vector<T>& v)
{
  for (auto p = v.begin(); p!=v.end(); ++p)
    cout << *p << "\n";
}

In C++98, you would have to write a much longer code:

template<class T> void printall(const vector<T>& v)
{
    for (typename vector<T>::const_iterator p = v.begin(); 
         p!=v.end(); ++p)
      cout << *p << "\n";
}

So, that's a very useful innovation of the C++11 language.

Let's get back to our problem. The "1024*1024*1024*5" expression has the 'int' type. That's why the 'auto' keyword will be useless in this case.

Neither will it help to deal with a loop like this:

size_t Count = BigValue;
for (auto Index = 0; Index < Count; ++Index)

Did we make it any better? No, we didn't. The number 0 is 'int', which means that the Index variable will now become 'unsigned', instead of 'int'. I'd say it has become even worse.

So is 'auto' of any use at all? Yes, it is. For example, in the following code:

string str = .....;
auto n = str.find("ABC");
if (n != string::npos)

The 'n' variable will have the 'string::size_type' type, and everything will be alright now.

We made use of the 'auto' keyword at last. But be careful - you should understand very well what you are doing, and why. Don't strive to defeat all the errors related to mixed arithmetic by using 'auto' everywhere you can. It's just one means of making it a bit easier, not a cure-all.

By the way, there is one more method to prevent type truncation in the example above:

unsigned n = str.find("ABC");

You can use a new variable initialization format which prevents type narrowing. The issue is that C and C++ languages tend to implicitly truncate certain types:

int x = 7.3;  // Oops!
void f(int);
f(7.3);  // Oops!

However, C++11's initialization lists don't allow type narrowing:

int x0 {7.3}; //compilation error
int x1 = {7.3}; //compilation error
double d = 7;
int x2{d}; //compilation error

But the following example is of more interest to us right now:

size_t A = 1;
unsigned X = A;
unsigned Y(A);
unsigned Q = { A }; //compilation error
unsigned W { A }; //compilation error

Imagine the code is written like this:

unsigned n = { str.find("ABC") };
   or this
unsigned n{str.find("ABC")};

This code will compile correctly in the 32-bit mode, but will fail in the 64-bit mode.

Again, it's not a cure-all; it's just another way to write safer programs.

Address arithmetic

It's pretty similar to what we discussed in the "Mixed arithmetic" section. The difference is only in that overflows occur when working with pointers (more).

For example:

float Region::GetCell(int x, int y, int z) const {
  return array[x + y * Width + z * Width * Height];
}

This fragment is taken from a real-life program for mathematical simulation, the amount of memory being a very crucial resource for it. In order to save memory in such applications, one-dimensional arrays are often used, which then are handled as three-dimensional arrays. There are special functions similar to GetCell for the programmer to access the required elements. But the code fragment above will only correctly handle those arrays which consist of fewer than INT_MAX items, because 32-bit int types are used to calculate the item indices.

Can C++11 help us with this one? No.

Changing an array type and pointer packing

It is sometimes necessary (or just convenient) to represent array items as items of a different type (more). It may also be convenient to store pointers in integer variables (more).

You may face issues here when exploiting incorrect explicit type conversions. The new C++11 standard can't help with that - programmers have always used explicit type conversions at their own risk.

Handling data stored in unions should also be mentioned. Such handling of data is a low-level one, and its results also depend solely on the programmer's skills and knowledge (more).

Serialization and data exchange

Sometimes you may need to create a compatible data format in your project - that is, one data set must be handled both by the 32-bit and 64-bit versions of the program. The issue is that the size of some data types may change (more).

The C++11 standard has made life a bit easier by offering types of a fixed size. Until this happened, programmers had to declare such types manually, or employ ones from the system libraries.

Now we have the following types with a fixed size:

int8_t
int16_t
int32_t
int64_t
uint8_t
uint16_t
uint32_t
uint64_t

Besides the type sizes, the data alignment is also subject to change, which may cause some troubles as well (more).

In connection to this, we should also mention the new keyword 'alignment' introduced in C++11. Now you can write the following code:

// an array of characters aligned to store double types
alignas(double) unsigned char c[1024]; 
// alignment on the 16-byte boundary
alignas(16) char[100];

There also exists the 'alignof' operator which returns alignment of a certain argument (which must be a type). For example:

constexpr int n = alignof(int);

Overloaded functions

When porting a 32-bit program to the 64-bit platform, you may discover that its execution logic has changed, which was caused by the use of overloaded functions in your code. If a function is overloaded for 32-bit and 64-bit values, an attempt to access it with an argument, say, of the size_t type, will be translated into different calls on different platforms (more).

I can't say for sure if any innovations of the C++11 language can help solve these issues.

Type size checks

There are cases when you need to check the sizes of data types. It may be necessary to make sure you won't get a buggy program after recompiling the code for a new platform.

Programmers often do this incorrectly, for example:

assert(sizeof(unsigned) < sizeof(size_t));
assert(sizeof(short) == 2);

It's a bad idea to do it like that. Firstly, the program will compile anyway. Secondly, these checks will only make sense in the debug version.

Instead, one should terminate compilation, if the necessary conditions prove false. There are a lot of ways to do that. For instance, you can use the _STATIC_ASSERT macro, available to developers working in Visual Studio. For example:

_STATIC_ASSERT(sizeof(int) == sizeof(long));

C++11 has a standard algorithm to terminate compilation if things go wrong - by offering static assertions.

Static assertions (compile-time-assertions) contain a constant expression, and a string literal:

static_assert(expression, string);

The compiler calculates the expression, and outputs a string as an error message, if the calculation result is false (i.e. the assertion is violated). For example:

static_assert(sizeof(size_t)>=8, 
  "64-bit code generation required for this library.");

struct S { X m1; Y m2; };
static_assert(sizeof(S)==sizeof(X)+sizeof(Y),
  "unexpected padding in S");

Conclusion

Extensive use of the C++11 language's new constructs in your code doesn't guarantee that you will avoid 64-bit errors. However, the language does offer a number of useful features to help make your code shorter, and safer.

References

We didn't aim at familiarizing the readers with as many innovations of the C++11 language as possible in this article. To get started with the new standard, please consider the following resources: