It was already true that an attacker could trick a user into copying a malicious link inside a file opened in Notepad to their browser, was that also a Remote Code Execution Vulnerability?
A US resident consumes 76 MWh per year [0], so 1.52 GWh over 20 years. A single model can be trained once and used by millions. Therefore LLMs are ~10000x more energy efficient than humans.
Your numbers are about how much is used also for transport etc. Sam's number were about what the human body itself uses for training, hence why I used the caloric consumption.
> The only difference is the impact on health parameters (different will get worse on low fat vs high fat), satiety, and how easy it is for someone to sustain the diet and stay in a deficit.
Have we even agreed on what AGI means? I see people throw it around, and it feels like AGI is "next level AI that isn't here yet" at this point, or just a buzzword Sam Altman loves to throw around.
Yeah, the argument here is that once you say this, people will say "you just dont know how to prompt, i pass the PTX docs together with NSight output and my kernel into my agent and run an evaluation harness and beat cuBLAS". And then it turns out that they are making a GEMM on Ampere/Hopper which is an in-distribution problem for the LLMs.
It's the idea/mindset that since you are working on something where the tool has a good distribution, its a skill issue or mindset problem for everyone else who is not getting value from the tool.
Another thing I've never got them to generate is any G code. Maybe that'll be in the image/3d generator side indirectly, but I was kind of hoping I could generate some motions since hand coding coordinates is very tedious. That would be a productivity boost for me. A very very niche boost, since I rarely need bespoke G code, but still.
Oh HELL no. :P Gcode is (at least if you’re talking about machining) the very definition of something you want to generate analytically using tried and tested algorithms with full consideration taken for the specifics of the machine and material involved.
I guess if you just want to use it to wiggle something around using a stepper motor and a spare 3D printer control board, it might be OK though. :)
We do not trust them (LLMs) 100% to reliably emit correct assembled code (why would anyone) compared with a compiler which the latter is deterministic and the former is fundamentally stochastic, no matter how you sample them.
There's almost a good point here, but you're misusing concepts that obfuscate the point you're trying to make. Determinism is about producing the same output given the same input. In this sense, LLMs are fundamentally deterministic. Inference produces scores for every word in their vocabulary. This score map is then sampled from according to the temperature to produce the next token. But this non-determinism is artificially injected.
But the determinism/non-determinism axis isn't the core issue here. The issue is that they are trained by gradient descent which produces instability/unpredictability in its output. I can give it a set of rules and a broad collection of examples in its context window. How often it will correctly apply the supplied rules to the input stream is entirely unpredictable. LLMs are fundamentally unpredictable as a computing paradigm. LLMs training process is stochastic, though I hesitate to call them "fundamentally stochastic".
> Determinism is about producing the same output given the same input. In this sense, LLMs are fundamentally deterministic.
You cannot formally verifiy prose or the text that LLMs generates when attempting to compare what a compiler does. So even in this sense that is completely false.
No-one can guarrantee that the outputs will be 100% to what the instructions you are giving to the LLM, which is why you do not trust it. As long as it is made up of artificial neurons that predict the next token, it is fundamentally a stochastic model and unpredictable.
One can maliciously craft an input to mess up the network to get the LLM to produce a different output or outright garbage.
Compilers have reproducable builds and formal verification of their functionality. No such thing with LLMs exist. Thus, comparing LLMs to a compiler and suggesting that LLMs are 'fundamentally deterministic' or is even more than a compiler is completely absurd.
You're just using words incorrectly. Deterministic means repeatable. That's it. Predictable, verifiable, etc are tangential to deterministic. Your points are largely correct but you're not using the right words which just obfuscates your meaning.
Nope. You have not shown how a large scale collection of neural networks irrespective of their architecture is more deterministic when compared to a 'compiler' and only repeating a known misconception of tweaking the temperature to 0 which does not bring the determinism you claim it brings with LLMs [0] [1] [2], otherwise you would not have this problem in the first place.
By even doing that, the result of the outputs are useless anyway. So this really does not help your point at all. So therefore:
> You're just using words incorrectly. Deterministic means repeatable. That's it. Predictable, verifiable, etc are tangential to deterministic.
There is nothing deteministic or predictable about an LLM even when you compare it to a compiler, unless you can guarrantee that the individual neurons through inference give a predictable output which would be useful enough for being a drop-in compiler replacement.
> You have not shown how a large scale collection of neural networks irrespective of their architecture is more deterministic
Its software. Without an external randomness source, its 100% deterministic excluding impacts of hardware glitches. This...isn’t debatable. You can make it seem non-deterministic by concealing inputs (e.g., when batching multiple requests, any given request is “nondeterministic” when viewed in isolation in many frameworks because batches use shared state and aren’t isolated), but even then it is still deterministic you are just choosing to look at an incomplete set of the inputs that determine the output.
> Its software. Without an external randomness source, its 100% deterministic excluding impacts of hardware glitches. This...isn’t debatable.
I don't think anyone would even go as far as to include all deep neural networks which are indeed a large scale collection of neural networks as being "100% deterministic"; regardless of their architecture. Not even you yourself and I can explain transparently why it sometimes works or doesn't work the way it does especially with any inputs. (Which adverse inputs can really mess up the model).
But first of all, the entire sentence that you should be quoting for complete context is:
>> You have not shown how a large scale collection of neural networks irrespective of their architecture is more deterministic when compared to a 'compiler' and only repeating a known misconception of tweaking the temperature to 0 which does not bring the determinism you claim it brings with LLMs [0] [1] [2], otherwise you would not have this problem in the first place.
So given this "100% determinism" you just said, surely that means that LLMs can replace a traditional compiler which needs this said determinism, since that LLMs are so useful for such a use case in production?
Then, as we practically test this, all of this quickly falls into my secondary point:
>> By even doing that, the result of the outputs are useless anyway. So this really does not help your point at all.
Again, there is just no point with repeating such myths from AI boosters that deep neural networks like LLMs are '100% deterministic'; even with temp=0 tweaks and in the practical sense.
> I don't think anyone would even go as far as to include all deep neural networks which are indeed a large scale collection of neural networks as being "100% deterministic"
Any other position is pure magical thinking. Adding more linear algebra never gets you out of 100% determinism. It gets you more complexity, and potentially makes systems chaotic, but chaotic systems are still deterministic, just highly sensitive to small input variations.
Yes, there's some unknown sources of non-determinism when running production LLM architectures at full capacity. But that's completely irrelevant to the point. The core algorithm is deterministic. And you're still conflating deterministic and predictable. It's strange to have such disregard for the meaning of words and their correct usage.
> Yes, there's some unknown sources of non-determinism when running production LLM architectures at full capacity. But that's completely irrelevant to the point.
It is directly relevant and supports my whole point which just debunked your assertions on LLMs being ‘deterministic’ which doesn’t exist at a fundamental sense which you can’t guarantee that the behaviour and even the outputs will be the same.
> The core algorithm is deterministic. And you're still conflating deterministic and predictable.
The entire LLM is still non-deterministic and it is still considered to be unpredictable even if you take that to account.
> It's strange to have such disregard for the meaning of words and their correct usage.
Nope. Not only you have shown absolutely zero sources at all to prove the deterministic nature of LLMs to where it can function as a “compiler”, you ultimately conceded by agreeing with the linked paper(s) recognising that LLMs still do not have deterministic or predictable properties at all; even if you tweak the temp, parameters, etc.
Therefore, once again LLMs are NOT compilers as even feeding them adversarial inputs can mess up the entire network up to become useless.
>Not only you have shown absolutely zero sources at all to prove the deterministic nature of LLMs to where it can function as a “compiler”
Note that I never defended using LLMs as a compiler. In fact I argued it would be inappropriate. I simply disagreed that the reason is because they are non-deterministic. If you weren't conflating the meaning of deterministic and predictable, you wouldn't keep misreading me.
> As a consequence, the model is no longer deterministic at the sequence-level,
but only at the batch-level
therefore they are deterministic when the batch size is 1
Your second source lists a large number of ways how to make LLMs determnistic. The title of your third source is "Defeating Nondeterminism in LLM Inference" which also means that they can be made deterministic.
Every single one of your sources proves you wrong, so no more sources need to be cited.
> therefore they are deterministic when the batch size is 1
This is like saying: "C++ is 'safe' when you turn off all the default features and if you know what you are doing", but up to the point where it becomes absolutely useless, and it's still not safe.
The language is still fundamentally memory unsafe, just like how LLMs are fundamentally deep neural networks which that comes with downsides such as being unpredictable blackboxes which carry lots of non-determinism of outputs with that.
> Your second source lists a large number of ways how to make LLMs determnistic. The title of your third source is "Defeating Nondeterminism in LLM Inference" which also means that they can be made deterministic.
That is the point: "When", "Can be made", "how to make LLMs determnistic".
It just tells you something about why both papers recognise the problem of non-determinism in LLMs which makes my whole point even more valid which is why I linked those papers.
Those papers have highlighted the fundamental nature of these LLMs right at the start of the paper.
> Every single one of your sources proves you wrong, so no more sources need to be cited.
LLMs are deterministic at minimal temperature. Talking about determinism completely misses the point. The human brain is also non-deterministic and I don't see anybody dismiss human written code based on that. If you remove randomness and choose tokens deterministically, that doesn't magically solve the problems of LLMs.
> The human brain is also non-deterministic and I don't see anybody dismiss human written code based on that.
Humans, in all their non deterministic brain glory, long ago realized they don't want their software to behave like their coworkers after a couple of margaritas.
You seem to be under the impression that I'm promoting LLMs, not sure where you got that idea. The argument is that non-determinism has nothing to do with the issues of LLMs.
reply