Why is undefined behavior (as opposed to compile / crash) allowed?

Question

Why is undefined behavior (as opposed to compile / crash) allowed?

I understand the reasons for the compiler / interpreter language extensions, but why does behavior that is not properly defined allow it to fail silently / do weird things rather than throw a compiler error? Is it because of the additional difficulty (impossible or just time-consuming) for the compiler to catch them)?

PS which languages have undefined behavior and which don't?

PPS Are there any cases of undefined behavior that is not impossible / takes too long to catch in compilation, and if there are any compelling reasons / excuses for them.

+2

compiler-construction language-agnostic compiler-errors

Roman A. Taycher May 10 '10 at 11:39

a source to share

3 answers

Largely because it is necessary to achieve certain goals. For example, C and C ++ were originally used to write operating systems, such as device drivers. To do this, they used (among other things) direct access to specific locations of the hardware that represented the I / O devices. Preventing access to these locations would prevent C from being used as intended (and C ++ was specifically designed to provide all the same capabilities as C).

Another factor is the really fundamental decision between language specification and platform definition. To use the same examples, C and C ++ are based on a conscious decision to limit the definition to a language and to leave the platform surrounding that language separate. There are quite a few alternatives, with Java and .NET as a couple of the most obvious examples, specify entire platforms instead.

They both reflect major differences in terms of design. One of the basic tenets of the C construct (largely preserved in C ++) was to "trust the programmer". While it has never been said so bluntly, the basic concept of Java sandboxing was / was based on the idea that you shouldn't trust the programmer.

As for which languages have / do not have undefined behavior, this dirty little secret: for all practical purposes, they all have undefined behavior. Some languages (again, C and C ++ are simple examples) go to great lengths to point out that this behavior is undefined, while many others try to argue that it does not exist (like Java) or is largely ignored by many from the "dark corners" where it occurs (eg Pascal, most .NET).

Those who argue that it doesn't exist usually pose the biggest problems. Java, for example, includes quite a few rules that try to guarantee consistent floating point results. As a result, they make it impossible to efficiently execute Java on a fairly small hardware level, but floating point results are still not really guaranteed. Worse, the floating point model they ask is not entirely perfect, so in some circumstances it gets in the way of getting the best results (or at least forces you to do extra work to get around what it requires).

To their credit, Sun / Oracle have (finally) started to notice this problem and are working on a significantly different floating point model that should be improved upon. I'm not sure if this hasn't been included in Java yet, but I suspect that when / if it is, there will be quite a significant "gap" between the code for the old model and the code for the new model.

+1

Jerry coffin May 10 '10 at 15:57

a source to share

Because different operating systems work differently (...) and you can't just say "fail in this case" because it might be something that can make the operating system better.

0

LukeN May 10 '10 at 11:45

a source to share

anon · Accepted Answer · 2010-05-10T11:41:58+0000

The concept of undefined behavior is required in languages such as C and C ++, since it would be impossible or too costly to detect the conditions that cause it. Take this code for example:

int * p = new int(0);
// lots of conditional code, somewhere in which we do
int * q = p;
// lots more conditional code, somewhere in which we do
delete p;
// even more conditional code, somewhere in which we do
delete q;

Here the pointer has been deleted twice, resulting in undefind behavior. Finding the error is too hard to do for a language like C or C ++.

Why is undefined behavior (as opposed to compile / crash) allowed?

More articles: