What is undefined behavior in C and C++? What about unspecified behavior and implementation-defined behavior? What is the difference between them?
Undefined behavior is one of those aspects of the C and C++ language that can be surprising to programmers coming from other languages (other languages try to hide it better). Basically, it is possible to write C++ programs that do not behave in a predictable way, even though many C++ compilers will not report any errors in the program!
Let's look at a classic example:
#include <iostream>
int main()
{
char* p = "hello!\n"; // yes I know, deprecated conversion
p[0] = 'y';
p[5] = 'w';
std::cout << p;
}
The variable p
points to the string literal "hello!\n"
, and the two assignments below try to modify that string literal. What does this program do? According to section 2.14.5 paragraph 11 of the C++ standard, it invokes undefined behavior:
The effect of attempting to modify a string literal is undefined.
I can hear people screaming "But wait, I can compile this no problem and get the output yellow
" or "What do you mean undefined, string literals are stored in read-only memory, so the first assignment attempt results in a core dump". This is exactly the problem with undefined behavior. Basically, the standard allows anything to happen once you invoke undefined behavior (even nasal demons). If there is a "correct" behavior according to your mental model of the language, that model is simply wrong; The C++ standard has the only vote, period.
Other examples of undefined behavior include accessing an array beyond its bounds, dereferencing the null pointer, accessing objects after their lifetime ended or writing allegedly clever expressions like i++ + ++i
.
Section 1.9 of the C++ standard also mentions undefined behavior's two less dangerous brothers, unspecified behavior and implementation-defined behavior:
The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine.
Certain aspects and operations of the abstract machine are described in this International Standard as implementation-defined (for example,
sizeof(int)
). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these respects.Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, order of evaluation of arguments to a function). Where possible, this International Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine.
Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). [ Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior. —end note ]
Specifically, section 1.3.24 states:
Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
What can you do to avoid running into undefined behavior? Basically, you have to read good C++ books by authors who know what they're talking about. Screw internet tutorials. Screw bullschildt.
It's a weird fact that resulted from the merge that this answer only covers C++ but this question's tags includes C. C has a different notion of "undefined behavior": It will still require the implementation to give diagnostic messages even if behavior is also stated to be undefined for certain rule violations (constraint violations).
@Benoit It is undefined behavior because the standard says it's undefined behavior, period. On some systems, indeed string literals are stored in the read-only text segment, and the program will crash if you try to modify a string literal. On other systems, the string literal will indeed appear change. The standard does not mandate what has to happen. That's what undefined behavior means.
@FredOverflow, Why does a good compiler allow us to compile code that gives undefined behavior? Exactly what good can compiling this kind of code give? Why didn't all good compilers give us a huge red warning sign when we are trying to compile code that gives undefined behavior?
@Pacerier There are certain things that are not checkable at compile time. For example it is not always possible to guarantee that a null pointer is never dereferenced, but this is undefined.
@Celeritas, undefined behavior can be non-deterministic. For example, it is impossible to know ahead of time what the contents of uninitialized memory will be, eg.
int f(){int a; return a;}
: the value ofa
may change between function calls.