In general I've long been very skeptical of removing optimizations that rely on undefined behavior. People say "I'd happily sacrifice 1% for better theoretical semantics", but theoretical semantics don't pay the bills of compiler writers. Instead, compiler developers are employed by the largest companies, where a 1% win is massive amounts of dollars saved. Any complaint about undefined behavior in C must acknowledge the underlying economics to have relevance to the real world.
As the paper notes, there are plenty of alternative C compilers available to choose from. The reason why GCC and LLVM ended up attaining overwhelming market share is simply that they produce the fastest possible code, because, at the end of the day, that is what users want.
If you want to blame someone, blame the designers of the C language for doing things like making int the natural idiom to iterate over arrays even when size_t would be better. The fact that C programmers continue to write "for (int i = 0; i < n; i++)" to iterate over an array is why signed overflow is undefined, and it is absolutely a critical optimization in practice.
Uh, massive investment? The commit logs reveals how many people work on them, and while it's massive compared to single-pizza teams I've worked in buildings where all of them could get a desk each.
> If you want to blame someone, blame the designers of the C language for doing things like making int the natural idiom to iterate over arrays even when size_t would be better. The fact that C programmers continue to write "for (int i = 0; i < n; i++)" to iterate over an array is why signed overflow is undefined, and it is absolutely a critical optimization in practice.
Well, size_t is unsigned and has defined overflow, so you'd lose the optimization if you switched to it. (Specifically, there's cases where defining overflow means a loop is possibly infinite, which blocks all kinds of optimizations.)
Many languages try to fix this by defaulting to wrap on overflow, but that was a mistake because you rarely actually want that. A better solution is to have a loop iteration statement that doesn't have an explicit "int i" or "i++" written out.
That's a strange example since it doesn't prove their point.
"int count; … for (int i = 0; i < count; ++i) {}" can't overflow so the optimization is always valid.
"int count; … for (int i = 0; i < count; i += 2) {}" is more of a problem.
I don't think "have a 64-bit int type" is the right approach for a new language either… we should be aiming for safety. If a variable's valid values are 0-50 then its type should be "integer between 0 and 50", not "integer between 0 and UINT64_MAX". Storage size should be an implementation detail.
There was no size_t in K&R1. Size_t was introduced in the standards process as was the definition of index variables in the for loop. You may have a complain with the standard there.
As for the optimization, it is based on misunderstanding of C semantics. The only place where the sign extend makes a difference is where pointers are longer than "ints" AND where the the iterator can overflow, and in that case, sign extend only makes a difference if the loop is incorrectly coded so that the end condition is never true. The code should just provoke a warning and then omit the sign extend (and it almost certainly doesn't make much of a difference since sign extend is highly optimized and has zero cost in a pipelined processor).
> There was no size_t in K&R1. Size_t was introduced in the standards process as was the definition of index variables in the for loop. You may have a complain with the standard there.
I certainly do. C and descendants makes you over-specify variables by declaring them all int/size_t/etc individually. It should've had C++11-style "auto" from the start and there should be a statement like "for i=0..n" that declares "i" the same type as "n".
We make a distinction between undefined and implementation defined behaviour for a reason. Saying that certain runtime behaviours result in malformed programs while being impractical to generate explicit checks for is not entirely crazy.
That said, I believe the set of undefined behaviours in our current standards is much, much too large - most of these should rightly be filed in the implementation-defined category instead. It is no longer the 70s and the very same modern compilers that perform more and more extreme optimisations year on year really do not need to account for a giant zoo of quirky experimental architectures; over the decades we've basically settled on a consensus on how pointers, integers, floating-point numbers etc ought to work.
Priorities change though; in particular, security is a rising priority. So it's not inevitable that funded compiler development must focus on wringing every last optimization out of software that might perform well enough if it were simpler. Organizations like NLnet and ISRG, for example, could fund work on getting to a usable Unix-like system that's entirely compiled with one of the simpler C compilers that don't exploit undefined behavior so much. They could justify it with the argument that security is best achieved through simplicity at all levels of the stack, including a simple build toolchain.
There is a large population of C "real programmers" who, when they write a C program that unsurprisingly doesn't work, conclude this must be somebody else's fault. After all, as a real programmer they certainly meant for their program to work, and so the fact it doesn't can't very well be their fault.
Such programmers tend to favour very terse styles, because if you don't write much then it can't be said to be your fault when, invariably, it doesn't do what you intended. It must instead be somebody else's fault for misunderstanding. The compiler is wrong, the thing you wanted was the obviously and indeed only correct interpretation and the compiler is willfully misinterpreting your program as written.
Such programmers of course don't want an error diagnostic. Their program doesn't have an error, it's correct. The compilers are wrong. Likewise newer better languages are unsuitable because the terse, meaningless programs won't compile in such languages. New languages often demand specificity, the real programmer is obliged to spell out what they meant, which introduces the possibility that they're capable of mistakes because what they meant was plain wrong.
The fact that a compiler is allowed/encouraged to silently remove whole sections of code because of some obscure factoid is an amazing source of footguns.
At least the warnings are getting a bit better for some of these.
Without these optimizations, you can't write fast scientific code in C. This was realized back in the early 1980s and it's why those rules were added.
In Fortran the aliasing rules are even stricter: given two arrays passed in as arguments the compiler can assume that they do not overlap, for example. I remember messing that up as a student long ago and getting strange results. The Fortran rule was to enable vectorization, which has been done for many decades.
You doubt it because you aren't a compiler developer, so you aren't aware of the history. Try taking some of the classic Fortran scientific programming libraries, such as LAPACK, recoding them in C, and then see what happens when you need to generate code with a compiler that doesn't do any of the optimizations the article complains about and has no undefined behavior. Then you'll figure out why.
My understanding is the reason FORTRAN is faster than C isn't because of stupid stuff like noalias and the like. It's because FORTRAN has arrays and C doesn't.
You're right, sort of. Because C doesn't have proper arrays, you make up for it by passing a pointer to the first element and the range. So for C to do as well as Fortran on matrix operations the compiler developers need help. One source of help is that if you have a pointer to double and a pointer to int, you can assume that writing pdouble[k] doesn't alter anything reachable as pint[m].
Why would that put C at a disadvantage performance-wise? Array decay is, with hindsight, an unfortunate feature, but optimizing access to contiguous memory is very low-hanging fruit as far as compiler optimizations go.
If you want better safety, runtime and compile time performance the lowest hanging fruit would be to fire the C standards committee and hard fork C++/C into C++ and C only ecosystems.
Cool, make a sci-C with all the optimisations allowed that is trivially interoperable with simple-C. Mangling the latter into the former serves no-one.
How much slower would removing the UB footgun make scientific code? Do you have benchmarks? Because we have endless firsthand accounts that it makes standard C useless.
I don't get how the fact that the compiler can remove or modify the code was thought to be a good idea. I get removing unused functions, but not conditions and changing the flow of the code. If there is unreachable code, best to issue a warning and let the programmer fix it. The compiler should optimize without changing the semantic of the code, even if it contains undefined/unspecified behavior.
To this it's impossible to write C without using a ton of non standard attributes and compiler options to just make it do the correct thing.
> The compiler should optimize without changing the semantic of the code, even if it contains undefined/unspecified behavior.
That is what it does. "Undefined behavior" is a lack of semantics, so it is preserving semantics when it leaves those paths out. You can make a "defined C" with `-fsanitize-trap=undefined`, but C was never a high level assembler, and performance is critical for C users too.
C was always a high-level assembler. UB was meant to be "this cannot be defined portably, refer to your CPU's documentation". What do you gain by making C unusable by default? Just make it 5% slower by default, but possible to reason about and give people the option to shoot themselves in the foot. I don't know what more you need than the creator of the language telling you "this is not possible to implement sanely".
> UB was meant to be "this cannot be defined portably, refer to your CPU's documentation"
You're probably thinking of implementation-defined behaviour, which is the term for behaviour that, although not fully specified by the standard, is a valid behaviour for the program.
Undefined behaviour is different. From the article: "Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose."
Many people don't notice the distinction, and their intuition about what compilers should do with their code matches implementation-defined behaviour; and therefore they complain when, for UB, it does not behave this way.
There is a very strong argument for reclassifying many specific UBs as implementation-defined behaviours, but that's a rather different conversation from "the compiler should always do something sensible when encountering UB".
Even assuming the 5% number were correct (depending on how expansive your definition of UB is, it may not be), asking everyone who doesn't adjust their compiler flags to accept a 5% slowdown for some theoretical benefits is at odds with economic reality.
Even assuming the 5% number were correct (depending on how expansive your optimisations with UB assumptions are, it may not be), asking everyone who doesn't adjust their compiler flags to accept their programs being silently miscompiled for some theoretical benefits is at odds with economic reality.
A) there is no evidence such phenomena - especially in net- are due to unsafe optimizations, b) those companies dont need to shift their costs to other users
> "Undefined behavior" is a lack of semantics, so it is preserving semantics when it leaves those paths out.
It's not preserving semantics, it's inferring a valid program from an invalid one, but since the programmer entered an invalid program, unilaterally deciding that the valid program was the intended program seems dubious at best. These should really be errors, or warnings at the very least.
Very often, a function inlined can be determined, at the place where it is expanded, to have substantial sections of dead code, based on knowledge of the values of arguments passed to it in that place. Expanded in a different place, different parts are dead. Warnings about the dead parts would lead you to turning off those warnings, so they are never turned on in the first place.
This gets more complicated when an inlined function calls another inlined function. The whole inner function may be in the middle of dead code, and thus everything it calls, too, and so all be elided. This sort of thing happens all the time. These elisions don't typically make code obviously much faster, cycle-wise, but they do make it smaller, with a smaller cache footprint. Cache footprint has a huge effect on overall performance.
In principle, the compiler could arrange to put the dead code in some other page, with a conditional jump there that never happens, consuming just another branch prediction slot. But branch prediction slots are cache, and the branch, though never taken, nonetheless consumes a slot. A processor ISA could, in principle, have a conditional branch instruction that is assumed by branch prediction always to go one way, and so not need to consume any branch prediction cache. But I don't know of any. RISC-V does not seem to be entertaining plans to specify such a branch.
Several extant chips do allow "hint" prefixes for branches, but I think they are ignored in current chips, according to experience where they were determined generally to be wrong. This is unfortunate, as sometimes the most frequently taken direction is not the one you want to be fastest. E.g., when spinning while watching an atomic flag, you want the looping branch to be assumed not taken, to minimize latency once the flag is clear, even though it most frequently is taken in recent history. (High-frequency trading code often has to resort to trickery to get the desired behavior.)
(There is a famous story about engineers at Digital Equipment Corporation, whence we got PDP-11s and Vaxen, and thus, indirectly, Unix. A comprehensive cycle count showed that a huge proportion of the instructions executed in their OS kernel were just one instruction. They moved heaven and earth to optimize this one instruction, but the kernel with the new instruction was exactly 0% faster. It turned out the instruction was used only in the idle loop that ran while waiting until something useful could be done. This really happened, and the same mistake is made again and again, to this day: e.g., people will sincerely swear that "rotl" and "popcnt" instructions are not important, on about the same basis.)
Intel recently implemented an atomic test-and-loop instruction that just stalls execution until an identified cache line is touched, thus avoiding the branch prediction stall getting out of the loop. I have not heard of anybody using it.
RISC-V, incidentally, suffers under exactly the noted delusion, so all existing chips lack all of the supposedly unimportant instructions. I have seen a claim that "RVA22", its general-purpose computing platform, will have them, but cannot find any evidence for it online.
Absolutely, those optimisations should be opt-in, otherwise it's impossible to reason about the correctness of your code. At work we had to replace some arithmetic by inline assembly, as there was literally no other way of making the compiler generate the correct expression.
I'm very curious to learn more about this as I thought it was easily possible to opt out of the overly aggressive defaults of GCC, Clang, and the like.
It is possible - but not whilst maintaining standards-compliance and there are always/often another one where you least expect it.
At the root is taking a “assume the programmers know what they are doing” language and turning it into a “assume the programmers are drooling morons” language.
I think it's the opposite: compiler writers under pressure to produce fast code assume that the programmers are smart, not morons, and that they understand the rather complex rules.
For example: the compiler is assuming someone isn't ignorant and knows, for example, if they want to write a variable as one type and read it as another, they need to consider two things: they are now in implementation-dependent territory (big endian vs little endian effects, for example), and they need to use unions, not just casts, if the same data can be interpreted as two different types and neither is array of char. Failure to use unions properly triggers aliasing problems.
Amusingly, unions are also defined so that operations that pun types are undefined. Gcc has private extensions that provide a way to pun types in unions, but those are not portable.
Specifically: if you have a union with members a and b, and you assign into member a, then reading from member b afterward is, by the C and C++ Standards, usually undefined. You are presumed to have some way to know that member a is live, and use that. You can then write into b, and later read that back out.
The C99 and C11 standards say that the behavior is unspecified (in standard-ese, "unspecified" and "undefined" are different), meaning that the standard doesn't say what you get. Of course that's true: the result is different on a big-endian and a little-endian machine. Packing the bytes of a double might generate a signalling NaN so you get a trap when you read it, or not, and the standards don't require that floating point representations match the IEEE standard, either.
It was many months ago and I no longer remember the full details, plus it's proprietary code, so I can't tell them to you. But the gist was that after optimisation, some floating point arithmetic was done in different order by different compilers. This particular expression really needed to be computed in exact order and, since nothing else worked, we had to just write it in compiler-specific assembly. IIRC some language extensions are coming to better specify float semantics on a per-function basis that might help one day.
That's a compiler bug. Floating-point math shouldn't be reassociated unless you were using -ffast-math, which is indeed a poorly named option that does bad things, but one many people demand.
This is a typical whining about UB article, but removing it won't get what you want, in particular your program still won't behave correctly across architectures. Overflow on shift left may be undefined, but how do you want to define it? If you want a "high level assembler", well, the underlying instructions behave differently on ARM, x86 scalar, and x86 SIMD.
The reason they claim program optimizations aren't important is because you can do it by hand for a specific architecture pretty easily, but you'll still want them when porting to a new one, eg if it wants loop counters to go in the opposite direction.
Yes, your program will have different semantics on different architectures. This is already the case with e.g. big vs little endian. But you will be able to reason about the program's semantics rather than going "I sure hope there's no UB in here" and throwing your hands up.
How much of this is driven by modern C++ style? I always assumed optimizers needed to become much more aggressive because template-heavy code results in convoluted IR with tons of unreachable code. And UB-based reasoning is the most effective tool to prove unreachability.
None of it: we're talking about C here, and every point in the article applies to the C standard as it existed when Linus posted his message to comp.os.minux so long ago.
> template-heavy code results in convoluted IR with tons of unreachable code
Does it? Honest question; my impression was that template-heavy code can tend to produce deep call trees, but not necessarily outright unreachable code unless you count instantiations ruled out by SFINAE/std::enable_if/tag dispatching, for which UB-based analyses are not necessary.
In addition, I thought template (meta)programming relied very heavily on compile-time knowledge, which seems to obviate the need for UB-based analyses in many cases.
I'm not particularly experienced, though, so maybe there's a gaping hole I'm missing.
It is quite common for any heavily inlined code to have lots of dead sections. Template code is very frequently heavily inlined. So, it is common for compilers to prune out the dead branches.
But C code is often heavily inlined, too, and similarly pruned.
Guess I need to pay more attention to the template code I use. I knew that they tended to rely very heavily on inlining for good performance, but I haven't read as much about the presence/absence of dead code, though to be fair maybe the deadness isn't always obvious to the programmer?
I should add that most of the Standard library will not often have many visibly dead code branches to prune. Most of what gets discarded is arguments and calls to no-op functions, all the scaffolding. But there is a lot of that. So, for example, an iterator on std::function isn't typically a pointer, but after optimization the instructions left standing are the same as if it was one.
The actual title of the paper is "How ISO C became unusable for operating systems development". Is there a particular reason why the first and last words have been removed here?
To me the stupid thing is the abuse of undefined behavior for changing the semantic of the code. The fact that a behavior is not defined in the standard doesn't mean that on a particular hardware platform it doesn't have a particular meaning (and most C programs doesn't need to be portable, since C it's mainly used for embedded these days and thus you are targeting a particular microcontroller/SOC).
These optimizations leave for C++ folks. C doesn't need all of that, just leave it as the "high level assembler" that it was in the old days, where if I write an instruction I can picture the assembler output in my mind.
Optimizers should not change the code semantics to me. Unfortunately with gcc it's impossible to rely on optimizations, so the only safe option is to turn them off entirely (-O0).
I'm curious what you would replace it with? I can't think of anything actually suitable for most of the low-level operating systems / embedded level things that use C.
I know people recommend rust for this kind of thing, but Rust really isn't appropriate in a lot of cases, especially when dealing with microcontrollers not supported by llvm (ie PIC, 8051 off the top of my head).
This may be changing, but I was also under the impression that Rust can't easily produce as small binaries as C can.
As an alternative in your use case, there was a pretty decent Modula-2 compiler for the 8051 (Mod51). It's a small, safe language and a reasonable fit for embedded architectures. It is a Wirth-ian BEGIN...END language which for some people means it has cooties, but I wrote a some useful stuff in it a long time ago and it was pretty painless, as such things go. There's even some standardization around Modula-2 in the embedded world (IEC 61311-3 ST). Unfortunately, I don't think Mod51 is sold any more; C won in that world.
I'd nominate Pascal (which has been used for OS dev before) and Ada. Not 100% sure how broad their support is for different hardware, but it also seems like adding new platforms to an existing toolchain is less work than inventing a new language while still benefiting from being not-C.
As long as you write compilers for all of the platforms which currently only have C language production quality compilers. That includes porting the whole development ecosystem for those platforms (like libraries) to whatever NextNewShiny language you deem worthy.
Sometimes I do wonder if all these UB optimisations aren't pushed by people aiming to make C and C++ unusable, so that people will be forced to move to other languages.
The problem is that C is sufficiently primitive that optimizing it is effectively trying infer structure backwards because the language doesn't specify it.
For-loops are a great example. Why are we even discussing about a "loop index"? Because we need to iterate over an array, string, etc. and "length" isn't something stored with the data structure so the compiler can't do the loop for you.
The real question is probably more along the lines of "Should the C standard start pinning things down more than it does?"
The answer is probably yes, but it's not straightforward to do. Look at just how much verbiage it took to create a decent memory model. And, still, using atomics in C (as opposed to C++) is brutal and the compiler can't help you very much.
Your argument would be more compelling if programming languages with higher level iteration primitives were significantly more optimizable than C. The underlying processor architectures have loop indexes. Just pretending you can do applicative programming doesn't mean you can. C is very adaptable - even to GPUs. Also, I have yet to see that decent memory model.
That is the fault of compilers, not the language. That thing should rise at maximum a warning. A compiler should not change the semantic of the code, even if the code relies on undefined/unspecified behavior. I don't get how someone thought that it is a good idea to silently remove apparently unreachable code.
> A compiler should not change the semantic of the code, even if the code relies on undefined/unspecified behavior.
You can argue that there is no semantic of the code if it invokes undefined behavior. What is the correct "semantic" of x+1 if x is the maximum representable value of a signed integer type? The language doesn't specify it, in fact it explicitly calls this undefined behavior; it's absolute nonsense. Should the compiler apply AI to determine the semantic intent?
> I don't get how someone thought that it is a good idea to silently remove apparently unreachable code.
This reminds me of the common misconception of C as a "portable assembly language". The standard is largely machine independent in that it's concerned with the effects of execution rather than how those effects are implemented by the compiler and eventually carried out by a real machine. As a result, you can write several pages of code and have the compiler fold it down into "mov eax, 123 / ret" and nothing that concerns the C language will have been lost.
If you have some expression that correctly folds into "false", none of the effects of the program as far as the C standard is concerned have changed. Yes, some undefined behavior may manifest differently depending on the optimizations made, but the language is not concerned with that.
The problem is very much with the language itself. Overflow in signed integers is undefined behavior. The optimizer correctly won't consider it as a constraint to optimization. The language could as easily have defined signed integer arithmetic as modular. The standard does not have a "compiler should not change the semantic of the code" clause that applies to undefined behavior. As a general rule, the C standard is basically the antithesis to it. The compiler gets free reign over the effects of code where the behavior is undefined by the C standard, which leaves a lot of UB holes.
"UB optimizations" happen as a natural consequence of basing optimizations on constraints posed by the language rather than descriptions of special cases. It's not like someone looked at code with signed overflows and decided "let's make this really stupid" and typed "if (MAY_SIGNED_OVERFLOW(expr)) surprise_me();". You might instead specify the constraints of signed integers according to the language spec and a generalized optimizer performs optimizations based on those constraints.
x + 1 < x for signed x is always false for values of x where the operation is at all defined in the C language. The good optimizer correctly solves for the constraints of the language and folds the operation into a constant false accordingly, in the same manner it would fold other expressions.
That you believe that there is any x for which this would be true is based on assumptions that the C language doesn't make. You likely assume that signed integer arithmetic should be modular as a consequence of its 2's complement bit representation. These are not assumptions that the C language makes. Signed integers don't have to be modular. They don't have to be 2's complement. It might be harder than you think to maintain a set of constraints and assumptions in addition to those specified by the C language. It would be these additions that would be special cases, not the optimizations.
If there's anything you should have a beef with, it's the language itself. Don't use C if you aren't ready to either be caught off guard with unexpected consequences undefined behavior at run time or have the patience to very carefully avoid it by learning what invokes it.
I think much of the "undefinedness" of the language comes from the fact that multiple implementations of the "C language" existed long before standardization. The subtle differences in these implementations and their stake in the standardization process meant that a lot of things simply couldn't be defined because it would invalidate existing implementations.
If a warning is issued in that case I'm tempted to say it's fair game. If you are sure it'd safe the compiler has extremely well established ways of telling it (i.e. assume or unreachable)
I wouldn't call it fair game, not least because nobody reads warnings. My central objection is that it's not what C is meant to be. C was made to write Unix, it was specifically created to be halfway between a portable macro assembler and a high-level language. That is a useful language that fills a particular niche. That is what C was for many years and a lot of code was written with that behaviour in mind. It can be argued that compilers which change semantics from what people intended are downright irresponsible given the foundational role that C has.
All this wouldn't be an issue if C were just some application language, but it's what all of computing is built on. It really should be simple by default and without adding more footguns than what directly programming in assembly gives you.
I wouldn't mind if all the assumptions-based optimisations were made opt-in. They are obviously useful in some way, but they're impossible to allow globally in a large legacy codebase. Which is pretty much every C codebase.
There is probably more than a million times as much C code that is not Unix as code that is. We could chuck in all the other OSes, and all the RTOSes besides, without changing the statement.
correct. In my experience a focus on UB (often actually said U.B.) tends to come from the C++ community, although a lot of it trickles down to the C community as well. I can recommend watching a few talks from C++ conferences. I especially enjoyed ones by Matt Godbolt, and Miro Knejp.
Not correct. Most UB is the same in C and C++. If you disallowed optimizations based on UB, it might have more effect on C++ programs, just because C++ tends to more deeply inlined abstractions that can benefit more from them. But those don't tend to need much attention. The cases people worry about are more common in C code, just because the concept of type is less important in C.
Or are you saying that UB is not from the C++ community? That could certainly be true, I am just describing the situation as I see it where C++ disallows more types of casting or reinterpreting than C does (even if compilers allow it).
One salient example I heard was that the only way to legally read the bits of a float in C++ is to memcpy the float onto an integer type. whereas in C its legal (I believe by using a union).
Nope! Same. BTW the memcpy is always optimized out, so you might as well; then it's totes portable, C or C++. You can always build up an object (int, float, struct) from char values. Technically they are supposed to have come from another value of the same type, but the compiler can't really tell where the bytes came from, so has to assume they must have come from there.
C does have a thing where a void pointer can be copied into any other pointer, without a cast, and operations with the typed pointer are OK. Technically, the pointer value is supposed to have been copied to the void pointer from the same type, but realloc has to work. Note, realloc does give you alignment guarantees you don't get from pointers to random things.
Nothing else is portable. Unions, other pointer casting, all is UB unless your compiler extends the definition. So, Gcc has an extension to make unions, used in a certain way, well-defined. Probably Clang copies that. You would have to look up exactly what. But the memcpy hack works in all compilers. You won't use it often enough for the extra verbosity to be a problem.
In C you're free to get the address of the float (presuming it's not in a register) and cast that address to a pointer to an array of char, which doesn't copy anything, just re-decrees, from that point on, to the code generator of the compiler what assembly code is generated.
Right, I should not have said you have to memcpy. You can poke bytes any way you like, in any order you like. But! memcpy is known to compiler optimizers, so is more reliably optimized away.
As the paper notes, there are plenty of alternative C compilers available to choose from. The reason why GCC and LLVM ended up attaining overwhelming market share is simply that they produce the fastest possible code, because, at the end of the day, that is what users want.
If you want to blame someone, blame the designers of the C language for doing things like making int the natural idiom to iterate over arrays even when size_t would be better. The fact that C programmers continue to write "for (int i = 0; i < n; i++)" to iterate over an array is why signed overflow is undefined, and it is absolutely a critical optimization in practice.