Still Comparing "this" Pointer to Null?

Dec 12 2013

Author: Dmitry Meshcheryakov

A long time ago, in a galaxy far, far away there was a widely used MFC library which had a few classes with methods that compared "this" pointer to null.

This is a translation of an article written by Dmitry Meshcheryakov, an ABBYY employee and first published here: "ABBYY blog. Still Comparing "this" Pointer to Null?". Translation done and published with permission of the copyright holder.

It looked something like this:

class CWindow {
    HWND handle;
    HWND GetSafeHandle() const
    {
         return this == 0 ? 0 : handle;
    }
};

"It doesn't make any sense!", the readers will argue. Why, it "does": this code "allows" you to call the GetSafeHandle() method through a null CWindow* pointer. This method is sometimes used in different projects. Let's find out why doing so is really a bad idea.

First of all according to the C++ standard (it follows from the paragraph 5.2.5/3 of the standard ISO/IEC 14882:2003(E)), calling any nonstatic method of any class through a null-pointer leads to undefined behavior. However, a code shown below may work in certain implementations:

class Class {
public:
    void DontAccessMembers()
    {
        ::Sleep(0);
    }
};

int main()
{
    Class* object = 0;
    object->DontAccessMembers();
}

It can work because no attempts are made to access the class members while the method is executed, and no late binding is used to call the method. The compiler knows which particular method of which particular class should be called and simply adds the necessary call. "This" pointer is passed as a parameter. The effect produced is the same as if it were a static method:

class Class {
public:
    static void DontAccessMembers(Class* currentObject)
    {
        ::Sleep(0);
    }
};

int main()
{
    Class* object = 0;
    Class::DontAccessMembers(object);
}

If the method were called virtually, it would have required late binding which is usually implemented through a pointer to the virtual methods table in the beginning of a particular object. In this case even finding out which method to call would require accessing the object's content, and this operation would most likely cause a crash in case of a null-pointer.

But we know for sure that our method will never be called virtually, don't we? After all, this code has been working well for some years.

The trouble is that the compiler may utilize undefined behavior for the purpose of optimization. For example:

int divideBy = ...;
whatever = 3 / divideBy;
if( divideBy == 0 ) {
    // THIS IS IMPOSSIBLE
}

In this code fragment we have an integer division by divideBy. Integer division by null causes undefined behavior (usually a crash). Therefore we can assume that the divideBy variable is not equal to null and eliminate the check during compilation and optimize the code accordingly.

In the same way the compiler can optimize the code comparing "this" pointer to null. According to the Standard, this cannot be null and therefore the checks and the corresponding code branches can be eliminated, which will greatly affect the code dependent on the comparison of "this" pointer to null. The compiler has a full right to "break" (actually just break it further) the code CWindow::GetSafeHandle() and generate machine code which doesn't contain the comparison and only reads the class field all the time.

Currently even the freshest versions of the most popular compilers (you can check it with at the GCC Explorer service) don't perform such optimizations, so "everything works" for now, right?

First, you will be very disappointed to waste quite a lot of time to find out that there is such an optimization now, after having moved to another compiler or a new version of your current compiler. That's why the code shown above is non-portable.

Second,

class FirstBase {
    int firstBaseData;
};

class SecondBase {
public:
    void Method()
    {
        if( this == 0 ) {
            printf( "this == 0");
        } else {
            printf( "this != 0 (value: %p)", this );
        }
    }
};

class Composed1 : public FirstBase, public SecondBase {
};

int main()
{
    Composed1* object = 0;
    object->Method();
}

GOOD LORD, the "this" pointer equals 0x00000004 on entering the method when compiled in Visual C++ 9, as the pointer initially set to null is adjusted so that it points to the beginning of a subobject of the corresponding class.

If you change the sequence order of the base classes:

class Composed2 : public SecondBase, public FirstBase {
};
    
int main()
{
    Composed2* object = 0;
    object->Method();
}

this will become null at the same conditions because the beginning of the subobject coincides with the beginning of the object it is included into. Thus we get a wonderful class whose method works only if this class is used "appropriately" in compound objects. I wish you good luck with debugging; the Darwin award has seldom been so close.

One can easily notice that implicit conversion of the pointer to the object to a pointer to the subobjectworks "wrong" in the case of the Composed1 class: this conversion yields a non-null pointer to the subobject from a null-pointer. When implementing a conversion of the same meaning, the compiler usually adds a check of the pointer for being null. For example, compilation of the following code with undefined behavior (the Composed1 class is the same as shown above):

SecondBase* object = reinterpret_cast<Composed1*>( rand() );
object->Method();

produces the following machine code in Visual C++ 9:

SecondBase* object = reinterpret_cast<Composed1*>( rand() );
010C1000  call        dword ptr [__imp__rand (10C209Ch)] 
010C1006  test        eax,eax
010C1008  je          wmain+0Fh (10C100Fh) 
010C100A  add         eax,4 
object->Method();
010C100D  jne         wmain+20h (10C1020h) 
010C100F  push        offset string "this == 0" (10C20F4h) 
010C1014  call        dword ptr [__imp__printf (10C20A4h)] 
010C101A  add         esp,4

The second instruction in this machine code is a comparison of the pointer to the object to null. If the check reveals that the pointer is equal to null, the control is not allowed to pass through the add eax,4 instruction which is used to shift the pointer. An implicit conversion here is implemented with a check, though it was also possible to call the method through the pointer and assume that the pointer is non-null.

In the first case (calling the subobject's class method straight through the pointer to the class object), the fact of the pointer being equal to null also corresponds to undefined behavior, and the check is not added here. If you thought it to be rubbish and fantasy when reading the paragraph about optimization of a code with a method call and pointer check for null after it, you shouldn't: the case described above is just the one where such an optimization has actually been used.

It's a bad idea to rely on calling a nonstatic method through a null-pointer. If you want to enable a method to be executed for a null pointer, you should make the method static and explicitly pass the pointer to the object as a parameter.

#Cpp #Knowledge