Lesson 19. Pattern 11. Serialization and data interchange

24.01.2012

Succession to existing data interchange protocols is an important component of the process of porting a program solution to a new platform. You need to provide the capability of reading the existing projects' formats, data interchange between 32-bit and 64-bit processes, etc.

Basically, the errors of this pattern concern serialization of memsize-types and data interchange operations they are used in:

1) size_t PixelsCount;
   fread(&PixelsCount, sizeof(PixelsCount), 1, inFile);
2) __int32 value_1;
   SSIZE_T value_2;
   inputStream >> value_1 >> value_2;

These samples contain errors of two kinds: using types of changeable capacity in binary interfaces and ignoring the byte order.

Using types of changeable capacity

Do not use types that change their sizes depending upon the development environment in binary interfaces of data interchange. All the types in C++ do not have a fixed size, so they cannot be used for this purpose. That is why developers of software development tools and programmers themselves create data types that have a strictly fixed size such as __int8, __int16, INT32, word64, etc.

These types enable data interchange between programs on various platforms although it requires some additional efforts. The two examples shown above are written incorrectly and it will become clear when some data types change their sizes from 32 bits to 64 bits. Keeping in mind the necessity of supporting obsolete data types, you may correct these samples in the following way:

1) size_t PixelsCount;
   __uint32 tmp;
   fread(&tmp, sizeof(tmp), 1, inFile);
   PixelsCount = static_cast<size_t>(tmp);
2) __int32 value_1;
   __int32 value_2;
   inputStream >> value_1 >> value_2;

But this way of correcting the code is not the best. The program may process more data after being ported to a 64-bit system, so 32-bit types in the code may become a great obstacle. In this case you may leave the obsolete code as it is to make it compatible with the obsolete data format but fix the incorrect types. Then you may create a new binary data format taking into consideration the previous errors. One more way out is to refuse to use binary formats and take the text format or other formats provided by various libraries.

Ignoring the byte order

Even after solving the issue of type sizes, you may encounter the problem of incompatibility of binary formats. The cause lies in a different data representation. It is most often related to a different byte order.

A byte order is a method of writing the bytes of multibyte numbers (see also Figure 1). The little-endian byte order means that the writing begins with the least-significant byte and ends with the most-significant byte. This writing order is employed in the memory of personal computers with x86 and x86-64-processors. The big-endian byte order means that the writing begins with the most-significant byte and ends with the least-significant byte. This order is a standard for TCP/IP protocols. That is why the big-endian byte order is often called the network byte order. This byte order is used in Motorola 68000 and SPARC processors.

Figure 1 - The byte order in a 64-bit type in little-endian and big-endian systems

Figure 1 - The byte order in a 64-bit type in little-endian and big-endian systems

While developing a binary interface or data format, you should remember about the byte order. If the 64-bit system you are porting your 32-bit application to has a byte order different from that of your application, you will have to adjust your code to this difference. To convert the big-endian byte order into the little-endian byte order and vice versa you may use the functions htonl(), htons(), bswap_64, etc.

Note. Many systems lack functions like bswap_64 while the function ntohl() allows you to reverse only 32-bit values. They forgot to add a version of this function for 64-bit types for some reason. If you need to change the byte order in a 64-bit variable, see the discussion of the topic "64 bit ntohl() in C++ ?" on stackoverflow.com site - there are several examples of how to implement this function.

Diagnosis

Unfortunately, PVS-Studio does not provide diagnosis for this pattern of 64-bit errors because this process cannot be formalized (we cannot compose a diagnostic rule). The only thing we may recommend you is to look through all the code fragments that are responsible for writing and reading data as well as sending data into other processes through the COM technology, for instance.

We would be glad if somebody of our readers proposed some ideas of how to detect the errors of this kind, at least partly.

The course authors: Andrey Karpov (karpov@viva64.com), Evgeniy Ryzhkov (evg@viva64.com).

The rightholder of the course "Lessons on development of 64-bit C/C++ applications" is OOO "Program Verification Systems". The company develops software in the sphere of source program code analysis. The company's site: http://www.viva64.com.

Contacts: e-mail: support@viva64.com, Tula, 300027, PO box 1800.