Lesson 16. Pattern 8. Memsize-types in unions

24.01.2012

A union is specific in that way that all the union items (members of the union) are assigned the same memory space, that is they are overlapped. Although you may access this memory space with the help of any member of the union, still you should choose it so that the result is sensible.

You should be very attentive dealing with unions that include pointers and other members of a memsize-type.

When you need to work with a pointer as an integer number, it may be convenient to use a union and work with the numerical representation of the type without explicit conversions. Consider the example:

union PtrNumUnion {
  char *m_p;
  unsigned m_n;
} u;
u.m_p = str;
u.m_n += delta;

This sample is correct for 32-bit systems and incorrect for 64-bit ones. Changing the member m_n on a 64-bit system we work only with a part of the pointer m_p (see Figure 1).

Figure 1 - The union format on the 32-bit and 64-bit systems

Figure 1 - The union format on the 32-bit and 64-bit systems

You should use a type that corresponds to the pointer's size:

union PtrNumUnion {
  char *m_p;
  size_t m_n; //type fixed
} u;

Another usual way of using a union is to represent one member as a set of several smaller members. For example, you may need to split a value of size_t type into bytes to implement the table algorithm of counting zero bits:

union SizetToBytesUnion {
  size_t value;
  struct {
    unsigned char b0, b1, b2, b3;
  } bytes;
} u;
   
SizetToBytesUnion u;
u.value = value;
size_t zeroBitsN = TranslateTable[u.bytes.b0] +
                   TranslateTable[u.bytes.b1] +
                   TranslateTable[u.bytes.b2] +
                   TranslateTable[u.bytes.b3];

This code contains a fundamental algorithmic error that consists in the assumption that the type size_t contains 4 bytes. It is hardly possible at present to search for algorithmic errors in automatic mode but what we can do is to find all the unions and check if they contain memsize-types. On finding such a union we might encounter an error in it and rewrite the code in the following way.

union SizetToBytesUnion {
  size_t value;
  unsigned char bytes[sizeof(value)];
} u;
   
SizetToBytesUnion u;
u.value = value;
size_t zeroBitsN = 0;
for (size_t i = 0; i != sizeof(bytes); ++i)
  zeroBitsN += TranslateTable[bytes[i]]; 

Diagnosis

The tool PVS-Studio allows the programmer to quickly find and look through all the unions that contain memsize-types in the program code. The analyzer generates the diagnostic warning V117 for those structures the programmer should consider when porting the code to a 64-bit system.

The course authors: Andrey Karpov (karpov@viva64.com), Evgeniy Ryzhkov (evg@viva64.com).

The rightholder of the course "Lessons on development of 64-bit C/C++ applications" is OOO "Program Verification Systems". The company develops software in the sphere of source program code analysis. The company's site: http://www.viva64.com.

Contacts: e-mail: support@viva64.com, Tula, 300027, PO box 1800.