V567. Undefined behavior. The variable is modified while being used twice between sequence points

27.08.2012

The analyzer detected an expression leading to undefined behavior. A variable is used several times between two sequence points while its value is changing. We cannot predict the result of such an expression. Let's consider the notions "undefined behavior" and "sequence point" in detail.

Undefined behavior is a feature of some programming languages — most famously C/C++. In these languages, to simplify the specification and allow some flexibility in implementation, the specification leaves the results of certain operations specifically undefined.

For example, in C the use of any automatic variable before it has been initialized yields undefined behavior, as do division by zero and indexing an array outside of its defined bounds. This specifically frees the compiler to do whatever is easiest or most efficient, should such a program be submitted. In general, any behavior afterwards is also undefined. In particular, it is never required that the compiler diagnose undefined behavior — therefore, programs invoking undefined behavior may appear to compile and even run without errors at first, only to fail on another system, or even on another date. When an instance of undefined behavior occurs, so far as the language specification is concerned anything could happen, maybe nothing at all.

A sequence point in imperative programming defines any point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed. They are often mentioned in reference to C and C++, because the result of some expressions can depend on the order of evaluation of their subexpressions. Adding one or more sequence points is one method of ensuring a consistent result, because this restricts the possible orders of evaluation.

Sequence points come into play when the same variable is modified more than once within a single expression. An often-cited example is the expression i=i++, which both assigns i to itself and increments i. The final value of i is ambiguous, because, depending on the language semantics, the increment may occur before, after or interleaved with the assignment. The definition of a particular language might specify one of the possible behaviors or simply say the behavior is undefined. In C and C++, evaluating such an expression yields undefined behavior.

C and C++ define the following sequence points:

  • Between evaluation of the left and right operands of the && (logical AND), || (logical OR), and comma operators. For example, in the expression *p++ != 0 && *q++ != 0, all side effects of the sub-expression *p++ != 0 are completed before any attempt to access q.
  • Between the evaluation of the first operand of the ternary "question-mark" operator and the second or third operand. For example, in the expression a = (*p++) ? (*p++) : 0 there is a sequence point after the first *p++, meaning it has already been incremented by the time the second instance is executed.
  • At the end of a full expression. This category includes expression statements (such as the assignment a=b;), return statements, the controlling expressions of if, switch, while, or do-while statements, and all three expressions in a for statement.
  • Before a function is entered in a function call. The order in which the arguments are evaluated is not specified, but this sequence point means that all of their side effects are complete before the function is entered. In the expression f(i++) + g(j++) + h(k++), f is called with a parameter of the original value of i, but i is incremented before entering the body of f. Similarly, j and k are updated before entering g and h respectively. However, it is not specified in which order f(), g(), h() are executed, nor in which order i, j, k are incremented. The values of j and k in the body of f are therefore undefined.[3] Note that a function call f(a,b,c) is not a use of the comma operator and the order of evaluation for a, b, and c is unspecified.
  • At a function return, after the return value is copied into the calling context. (This sequence point is only specified in the C++ standard; it is present only implicitly in C[4].)
  • At the end of an initializer; for example, after the evaluation of 5 in the declaration int a = 5;.
  • In C++, overloaded operators act as functions, so a call of an overloaded operator is a sequence point.

Now let's consider several samples causing undefined behavior:

int i, j;
...
X[i]=++i;
X[i++] = i;
j = i + X[++i];
i = 6 + i++ + 2000;
j = i++ + ++i;
i = ++i + ++i;

We cannot predict the calculation results in all these cases. Of course, these samples are artificial and we can notice the danger right away. So let's examine a code sample taken from a real application:

while (!(m_pBitArray[m_nCurrentBitIndex >> 5] &
         Powers_of_Two_Reversed[m_nCurrentBitIndex++ & 31]))
{}
return (m_nCurrentBitIndex - BitInitial - 1);

The compiler can calculate either of the left or right arguments of the '&' operator first. It means that the m_nCurrentBitIndex variable might be already incremented by one when calculating "m_pBitArray[m_nCurrentBitIndex >> 5]". Or it might still be not incremented.

This code may work well for a long time. However, you should keep in mind that it will behave correctly only when it is built in some particular compiler version with a fixed set of compilation options. This is the correct code:

while (!(m_pBitArray[m_nCurrentBitIndex >> 5] &
         Powers_of_Two_Reversed[m_nCurrentBitIndex & 31]))
{ ++m_nCurrentBitIndex; }
return (m_nCurrentBitIndex - BitInitial);
This code does not contain ambiguities anymore. We also got rid of the magic constant "-1".

Programmers often think that undefined behavior may occur only when using postincrement, while preincrement is safe. It's not so. Further is an example from a discussion on this subject.

Question:

I downloaded the trial version of your studio, ran it on my project and got this warning: V567 Undefined behavior. The 'i_acc' variable is modified while being used twice between sequence points.

The code

i_acc = (++i_acc) % N_acc;

It seems to me that there is no undefined behavior because the i_acc variable does not participate in the expression twice.

Answer:

There is undefined behavior here. It's another thing that the probability of error occurrence is rather small in this case. The '=' operator is not a sequence point. It means that the compiler might first put the value of the i_acc variable into the register and then increment the value in the register. After that it calculates the expression and writes the result into the i_acc variable and then again writes a register with the incremented value into the same variable. As a result we will get a code like this:

REG = i_acc;
REG++;
i_acc = (REG) % N_acc;
i_acc = REG;

The compiler has the absolute right to do so. Of course, in practice it will most likely increment the variable's value at once, and everything will be calculated as the programmer expects. But you should not rely on that.

Consider one more situation with function calls.

The order of calculating function arguments is not defined. If a variable changing over time serves as arguments, the result will be unpredictable. Consider this sample:

int A = 0;
Foo(A = 2, A);

The 'Foo' function may be called both with the arguments (2, 0) and with the arguments (2, 2). The order in which the function arguments will be calculated depends on the compiler and optimization settings.

References

You can look at examples of errors from real projects which were detected by this diagnostic message.