Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The computer science answer: a compiler is deterministic as a function of its full input state. Engineering answer: most real builds do not control the full input state, so outputs drift.

To me that implies the input isn't deterministic, not the compiler itself

 help



You're not wrong but I think the point is to differentiate between the computer science "academic" answer and the engineering "pragmatic" answer. The former is concerned about correctly describing all possible behavior of the compiler, whereas the latter is concerned about what the actual experience is when using the compiler in practice.

You might argue that this is redefining the question in a way that changes the answer, but I'd argue that's also an academic objection; pragmatically, the important thing isn't the exact language but the intent behind the question, and for an engineer being asked this question, it's a lot more likely that the person asking has context for asking that cares about more than just the literal phrasing of "are compilers deterministic?"


> ... the important thing isn't the exact language but the intent behind the question ...

If we're not going to assume the input state is known then we definitely can't say what the intent behind the question is - for many engineering applications the compiler is deterministic. Debian has the whole reproducible builds thing going which has been a triumph of pragmatic engineering on a remarkable scale. And suggests that, pragmatically, compilers may be deterministic.


It matters a lot. For instance, many compilers will put time stamps in their output streams. This can mess up the downstream if your goal is a bit-by-bit identical piece of output across multiple environments.

And that's just one really low hanging fruit type of example, there are many more for instance selecting a different optimization path when memory pressure is high and so on.


Like throwing dice: deterministic in theory, seemingly random in practice except under strictly controlled conditions.

Also, most real build systems build from a clean directory and checkout.. so, outside of a dev's machine they should be 100% reproducible, because the inputs should be reproducible. If builds aren't 100% reproducible that's an issue!

> To me that implies the input isn't deterministic, not the compiler itself

or the system upon which the compiler is built (as well as the compiler itself) has made some practical trade offs.

the source file contents are usually deterministic. the order in which they're read and combined and build-time metadata injections often are not (and can be quite difficult to make so).


I mean, if you turn off incremental compilation and build in a container (or some other "clean room" environment), it should turn out the same each time. Local builds are very non-deterministic, but CI/CD shouldn't be.

Either way it's a nitpick though, a compiler hypothetically can be deterministic, an LLM just isn't? I don't think that's even a criticism of LLMs, it's just that comparing the output of a compiler to the output of an LLM is a bad analogy.


> I mean, if you turn off incremental compilation and build in a container (or some other "clean room" environment), it should turn out the same each time. Local builds are very non-deterministic, but CI/CD shouldn't be.

lol, should. i believe you have to control the clock as well and even then non-determinism can still be introduced by scheduler noise. maybe it's better now, but it used to be very painful.

> Either way it's a nitpick though, a compiler hypothetically can be deterministic, an LLM just isn't? I don't think that's even a criticism of LLMs, it's just that comparing the output of a compiler to the output of an LLM is a bad analogy.

llm inference is literally sampling a distribution. the core distinction is real though, llms are stochastic general computation where traditional programming is deterministic in spirit. llm inference can hypothetically be deterministic as well if you use a fixed seed, although, like non-trivial software builds on modern operating systems, squeezing out all the entropy is a non-trivial affair. (some research labs are focused on just that, deterministic llm inference.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: