Stack Overflow


Definition

A stack overflow is a run-time software bug when a program attempts to use more space than is available on the run-time stack, which typically results in a program crash.

Run-time Stack

A run-time stack is a special area of computer memory that works on the LIFO principle (Last in, first out: the last element added to a structure must be the first one to be removed). The word "stack" refers to the manner in which several plates are stacked: you form a stack by putting the plates on top of each other (this way of adding an object into the stack is called "push") and then remove them starting with the top plate (this way of removing an object from the stack is known as "pop"). Run-time stack is also known as call stack, execution stack and machine stack (these terms are used in order not to mix it up with the "stack" as an abstract data structure).

The purpose of a stack is to allow the programmer to conveniently arrange calls of subroutines. A stack can be used to store arguments to be passed to a function being called and its local variables. If another function is called by the first one, it can pop the arguments from the stack and use them, as well as store its own variables in a memory area allocated for this function. As it returns control, it also clears and frees the stack memory. High-level language programmers usually don't bother about such things, for the task of generating the necessary routine code is performed solely by the compiler.

Consequences of the Bug

We have finally got close to the subject of our discussion. As an abstraction, a stack is an infinite storage you can endlessly add new items into. Unfortunately, everything is finite in the real world - stack memory is no exception. What will happen when it runs out as new arguments are being pushed into it or when a function allocates memory to store its variables?

An error known as a stack overflow will occur. Since a stack is used to arrange calls of user subroutines (and most programs written in contemporary programming languages - including object-oriented ones - actively employ functions one way or another), the program won't be able to call any function after the error occurs. When that happens, the operating system takes control back, clears the stack and terminates the program. Here lies the difference between the buffer overflow and the stack overflow. The former occurs when the program attempts to access a memory area outside the buffer's boundary and remains unnoticed if there is no protection against that; the program goes on to run correctly if lucky enough. It's only when there is memory protection that a segmentation fault occurs. But when a stack overflow occurs, the program inevitably crashes.

To be most precise, this scenario is only true for native languages. The virtual machine in managed languages has its own stack for managed programs, which is easier to monitor, so that you can even afford throwing an exception when a stack overflow occurs. But C and C++ cannot afford such a "luxury".

Reasons for the Bug

What are the reasons for this unpleasant error? Keeping in mind the above described mechanism, we can name one: too many embedded function calls. This scenario is especially probable when using recursion: it is actually this error that infinite recursion terminates with (when there is no lazy evaluation mechanism), unlike an infinite loop which may be useful at times. However, when there is a very small area of memory allocated on the stack (which is, for instance, typical of microcontrollers), just a short sequence of calls will do the job.

Another reason is local variables requiring too much stack memory. It's a bad idea to create a local array of a million of items or a million local variables (just in case). Just one call of such a "greedy" function may easily trigger a stack overflow. If you want to get large data amounts, you'd better use dynamic memory to be able to process an error in case it runs out.

Dynamic memory, however, is quite slow to allocate and free (because it is managed by the operating system). Besides, you have to manually allocate and release it when provided with a direct access. Conversely, stack memory is allocated very quickly (in fact you just need to change the value of one register); moreover, objects allocated on the stack are automatically destroyed when the function returns control and clears the stack. You naturally can't help the urge to exploit it. Therefore, the third reason for the bug is manual allocation of stack memory by the programmer. The C library provides the special function alloca for this purpose. An interesting thing is that while the malloc function (intended to allocate dynamic memory) has a "sibling" that frees it (the free function), the alloca function has none: memory is freed automatically once the function returns control. This thing is likely to complicate the issue, for you can't free memory before leaving the function. Though the man page for the alloca function clearly reads that it "is machine- and compiler-dependent; on many systems it cannot be used properly and may cause errors; its use is discouraged", programmers still use it.

Examples

As an example, let's study a code fragment performing recursive file search (taken from MSDN):

void DirSearch(String* sDir)
 {
     try
     {
         // Find the subfolders in the folder that is passed in.
         String* d[] = Directory::GetDirectories(sDir);
         int numDirs = d->get_Length();
         
         for (int i=0; i < numDirs; i++)
         {
             // Find all the files in the subfolder.
             String* f[] = Directory::GetFiles(d[i],textBox1->Text);
             int numFiles = f->get_Length();

             for (int j=0; j < numFiles; j++)
             {
                 listBox1->Items->Add(f[j]);
             }
             DirSearch(d[i]);
         }
     }
     catch (System::Exception* e)
     {
         MessageBox::Show(e->Message);
     }
 }

This function receives the list of items in the specified folder and then recursively calls itself over those items which are folders. If the file tree is deep enough, the result of this is quite obvious.

Here is an example to illustrate the second reason taken from the question "What might be the reason for a stack overflow exception?" asked at Stack Overflow (this is a question-and-answer website dealing with all programming-related topics, not only stack overflow, as one may conclude from its name):

#define W 1000
#define H 1000
#define MAX 100000 
//...
int main()
{
    int image[W*H];
    float dtr[W*H];
    initImg(image,dtr);
    return 0;
}

As you can see, the main function asks for some stack memory to be allocated for an int-array and a float-array, one million items each, which gives just a bit less than 8 Mbytes in total. If we recall that Visual C++ by default reserves only 1 Mbyte for the stack, we can easily answer that question.

And, finally, here is an example taken from the GitHub-repository of the Flash-player Lightspark project:

DefineSoundTag::DefineSoundTag(/* ... */)
{
    // ...
    unsigned int soundDataLength = h.getLength()-7;
    unsigned char *tmp = (unsigned char *)alloca(soundDataLength);
    // ...
}

One may hope that h.getLength()-7 won't grow too big and no overflow will occur in the next line. But is the time saved on memory allocation worth a potential crash?

Conclusion

Stack overflow is a fatal error which is most often found in programs containing recursive functions. It can also be caused by pushing too many local variables into the stack or manual allocation of stack memory. Stick to the old good rules: when you have a choice, prefer iteration to recursion, and avoid manual interference in what the compiler does better.

References


Bugs Found

Checked Projects
346
Collected Errors
13 124