Reflections on the Null Pointer Dereferencing Issue

Jan 15 2015

Author: Andrey Karpov

As I have recently found out, the question whether or not the code &((T*)(0)->x) is correct appears to be quite complicated. I decided to write a small post on this subject.

In my recent article about a Linux kernel check done by the PVS-Studio analyzer, I mentioned having come across the following code fragment in the kernel's code:

static int podhd_try_init(struct usb_interface *interface,
        struct usb_line6_podhd *podhd)
{
  int err;
  struct usb_line6 *line6 = &podhd->line6;

  if ((interface == NULL) || (podhd == NULL))
    return -ENODEV;
  ....
}

I also wrote in that article that this code was incorrect in my opinion. See the article for details.

After publishing it, I got piles of emails, people telling me I had been wrong and the code was absolutely correct. Many pointed out that if podhd == 0, then the code in fact implemented the "offsetof" idiom, so nothing terrible could possibly occur. For me not to write numbers of individual replies, I decided to write one answer for everyone in the form of a small blog post.

Naturally, I did a deeper investigation of the subject. But honestly, I only grew confused even more. So I can't give you an exact answer whether or not you can write code like that; I will only share some links and my own considerations with you.

When writing that article about the Linux check, I was thinking in the following way.

Any null pointer dereferencing operation is undefined behavior. One of the consequences of undefined behavior may be such code optimization that can result in removing the (podhd == NULL) check. It was this scenario that I described in the article.

In their letters, some developers told me they had failed to reproduce the same behavior on their compilers. But it still doesn't prove anything. The program's expected correct behavior is just one of the cases of undefined behavior.

Some also pointed out to me that the ffsetof() macro is implemented in exactly the same manner:

#define offsetof(st, m) ((size_t)(&((st *)0)->m))

But it doesn't prove anything either. Such macros are deliberately implemented so that they could work correctly on certain compilers. If we write a similar code, it won't necessarily work right.

Moreover, in the example with the macro, the compiler is directly handling 0 and therefore can guess what the programmer wants it to do. But when 0 is stored in a variable, it's just quite a different story and the compiler may respond unpredictably.

This is what Wikipedia has to say about offsetof:

The "traditional" implementation of the macro relied on the compiler being not especially picky about pointers; it obtained the offset of a member by specifying a hypothetical structure that begins at address zero:

#define offsetof(st, m) ((size_t)(&((st *)0)->m))

This works by casting a null pointer into a pointer to structure st, and then obtaining the address of member m within said structure. While this works correctly in many compilers, it has undefined behavior according to the C standard, since it involves a dereference of a null pointer (although, one might argue that no dereferencing takes place, because the whole expression is calculated at compile time). It also tends to produce confusing compiler diagnostics if one of the arguments is misspelled. Some modern compilers (such as GCC) define the macro using a special form instead, e.g.

#define offsetof(st, m) __builtin_offsetof(st, m)

As you can see, I am right according to what is said in Wikipedia: you can't write code like that; this is undefined behavior. Some programmers at the Stack Overflow site also agree with that: Address of members of a struct via NULL pointer.

But I am still embarrassed by the fact that while everyone is talking of undefined behavior I can't find an exact explanation on the subject anywhere. For instance, that extract from the Wikipedia article has the 'citation needed' mark.

There were numerous debates on similar issues on forums, but I haven't found any clear and plain explanation supported by references to the C or C++ standards there.

There is also one old discussion of the standard which hasn't clarified the point either: 232. Is indirection through a null pointer undefined behavior?

So, I haven't come to any certain final conclusion regarding this issue so far. But I still believe that code is bad and should be refactored.

If anyone happens to have any good considerations and facts on the subject, please share them with me and I'll add them at the end of this article.

#Cpp