It has come to my mind too. It is user hostile none the less, blaming the user f...

alerighi · on Oct 6, 2021

That is the fault of compilers, not the language. That thing should rise at maximum a warning. A compiler should not change the semantic of the code, even if the code relies on undefined/unspecified behavior. I don't get how someone thought that it is a good idea to silently remove apparently unreachable code.

boomlinde · on Oct 7, 2021

> A compiler should not change the semantic of the code, even if the code relies on undefined/unspecified behavior.

You can argue that there is no semantic of the code if it invokes undefined behavior. What is the correct "semantic" of x+1 if x is the maximum representable value of a signed integer type? The language doesn't specify it, in fact it explicitly calls this undefined behavior; it's absolute nonsense. Should the compiler apply AI to determine the semantic intent?

> I don't get how someone thought that it is a good idea to silently remove apparently unreachable code.

This reminds me of the common misconception of C as a "portable assembly language". The standard is largely machine independent in that it's concerned with the effects of execution rather than how those effects are implemented by the compiler and eventually carried out by a real machine. As a result, you can write several pages of code and have the compiler fold it down into "mov eax, 123 / ret" and nothing that concerns the C language will have been lost.

If you have some expression that correctly folds into "false", none of the effects of the program as far as the C standard is concerned have changed. Yes, some undefined behavior may manifest differently depending on the optimizations made, but the language is not concerned with that.

The problem is very much with the language itself. Overflow in signed integers is undefined behavior. The optimizer correctly won't consider it as a constraint to optimization. The language could as easily have defined signed integer arithmetic as modular. The standard does not have a "compiler should not change the semantic of the code" clause that applies to undefined behavior. As a general rule, the C standard is basically the antithesis to it. The compiler gets free reign over the effects of code where the behavior is undefined by the C standard, which leaves a lot of UB holes.

boomlinde · on Oct 7, 2021

"UB optimizations" happen as a natural consequence of basing optimizations on constraints posed by the language rather than descriptions of special cases. It's not like someone looked at code with signed overflows and decided "let's make this really stupid" and typed "if (MAY_SIGNED_OVERFLOW(expr)) surprise_me();". You might instead specify the constraints of signed integers according to the language spec and a generalized optimizer performs optimizations based on those constraints.

x + 1 < x for signed x is always false for values of x where the operation is at all defined in the C language. The good optimizer correctly solves for the constraints of the language and folds the operation into a constant false accordingly, in the same manner it would fold other expressions.

That you believe that there is any x for which this would be true is based on assumptions that the C language doesn't make. You likely assume that signed integer arithmetic should be modular as a consequence of its 2's complement bit representation. These are not assumptions that the C language makes. Signed integers don't have to be modular. They don't have to be 2's complement. It might be harder than you think to maintain a set of constraints and assumptions in addition to those specified by the C language. It would be these additions that would be special cases, not the optimizations.

If there's anything you should have a beef with, it's the language itself. Don't use C if you aren't ready to either be caught off guard with unexpected consequences undefined behavior at run time or have the patience to very carefully avoid it by learning what invokes it.

I think much of the "undefinedness" of the language comes from the fact that multiple implementations of the "C language" existed long before standardization. The subtle differences in these implementations and their stake in the standardization process meant that a lot of things simply couldn't be defined because it would invalidate existing implementations.

mhh__ · on Oct 6, 2021

If a warning is issued in that case I'm tempted to say it's fair game. If you are sure it'd safe the compiler has extremely well established ways of telling it (i.e. assume or unreachable)

Asooka · on Oct 7, 2021

I wouldn't call it fair game, not least because nobody reads warnings. My central objection is that it's not what C is meant to be. C was made to write Unix, it was specifically created to be halfway between a portable macro assembler and a high-level language. That is a useful language that fills a particular niche. That is what C was for many years and a lot of code was written with that behaviour in mind. It can be argued that compilers which change semantics from what people intended are downright irresponsible given the foundational role that C has.

All this wouldn't be an issue if C were just some application language, but it's what all of computing is built on. It really should be simple by default and without adding more footguns than what directly programming in assembly gives you.

I wouldn't mind if all the assumptions-based optimisations were made opt-in. They are obviously useful in some way, but they're impossible to allow globally in a large legacy codebase. Which is pretty much every C codebase.

ncmncm · on Oct 7, 2021

There is probably more than a million times as much C code that is not Unix as code that is. We could chuck in all the other OSes, and all the RTOSes besides, without changing the statement.