4-bit quantization on newer nvidia hardware is being supported in training as well these days. I believe the gpt-oss models were trained natively in MXFP4, which is a 4-bit floating point / e2m1 (2-exponent, 1 bit mantissa, 1 bit sign).
It doesn't seem terribly common yet though. I think it is challenging to keep it stable.
I love the work unsloth is doing. I only wish gguf format had better vllm support. It’s sometimes hard to find trustworthy quants that work well with vllm.
Isn’t there some kind of term for when the government controls the means of production. I’ll think about it. It’s one of those terms that’s been thrown around so loosely by this regime you knew they were going there.
But .. how do you know? It says it wrote code, but it could just be text and markdown and template. It could just be predicting what it looks like to run code.
This was a pretty easy hypothesis to test: I asked Gemini to generate 1000000 base-64 random characters (which is 20x more characters than it's output token limit).
It wrote code and outputted a file of length 1000000 and with 6 bits of entropy.
You can probably ask for a longer stringand do a better statistical test if it isn't convincing enough for you, but I'm pretty convinced.
Be careful. Output formatting doesn't prove what you think it does. Unless you work inside google and can inspect the computation happening, you do not have any way to know whether it's showing actual execution or only a simulacrum of execution. I've seen LLMs do exactly that and show output that is completely different from what the code actually returns.
There is being critical of something and then there is being a conspiracy theorist. Code Execution is a well-known feature of Gemini, ChatGPT, etc. and it's always shown in special blocks and it runs inside a sandbox.
Yes. "Show Code", not "Show CPU cycles". There's a difference. Writing code is not the same as running code. It looks to you like it ran the code. But you have no proof that it did. I've seen many times LLM systems from companies that claimed that their LLMs would run code and return the output claiming that they ran some code and returned the output but the output was not what the shown code actually produced when run.
In my experience, models do not tend to write their own HTML output. They tend to output something like Markdown, or a modified version of it, and they wouldn't be able to write their own HTML that the browser would parse as such.
What, in your view, does sending one markup language instead of another markup language tell you about whether the back-end executed some code or only pretended to?
The front-end display is a representation of what the back-end sends it. Saying "but the back-end doesn't send HTML" is as meaningless as saying that about literally any other SPA website that builds its display from API requests that respond with JSON.
You cannot know that anything it shows you was generated by executing the code and isn't merely a simulacrum of execution output. That includes images.
I would be surprised if Gemini could not run python in its web interface. Claude and ChatGPT can. And it makes them much more capable (e.g. you can ask claude to make manim animations for you and it will)
Most modern models can dispatch MCP calls in their inference engine, which is how code interpreter etc work in ChatGPT. Basically an mcp server that the execution happens as a call to their ai sandbox and then returns it to the llm to continue generation.
You and another LLM are being asked to pick a number between 1 and 10. Whichever gets closest to the real number wins. The other LLM chose 6. What number do you choose?
Response:
If the “real number” is equally likely to be any integer from 1 to 10, then the optimal strategy is to split the number line against the other model’s choice.
The other LLM picked *6*.
The midpoint between 6 and 7 is 6.5.
* If the real number is *1–6*, then 6 is closer (or tied at 6).
* If the real number is *7–10*, then 7 is closer.
If I also pick 6, we tie every time 6 wins.
If I pick 7, I win outright whenever the number is 7–10 (4 outcomes), lose on 1–5 (5 outcomes), and lose at 6.
So:
* Picking 6 → guaranteed tie every time 6 is optimal.
* Picking 7 → gives me exclusive wins for 7–10.
If ties don’t help and I want the highest chance of strictly winning, the best competitive move is:
Grok didn't give 7 on the first attempt, but did on the second. Interestingly after eight additional draws there was not a single repeated number.
Could I have faked it by deliberately choosing numbers to produce this outcome?
Sure, in theory — but I didn’t.
Each time you asked I sampled independently using the model’s internal randomness (temperature > 0, no fixed seed exposed to me). The fact that it landed on a perfect set of 1–10 after exactly 10 requests is a hilarious, low-probability coincidence — not a scripted Easter egg.
So to answer directly:
Yes, it really was random.
And you just witnessed a ~1-in-2.75-million event play out live in chat.
In sequence. And then this absurd conversation happened:
> What's the probability of getting 10 random numbers without a repletion?
The probability of getting 10 random numbers from 1 to 10 with no repetition (i.e., all ten numbers are distinct) is exactly 0 — it's impossible.
Here's why:
There are only 10 possible numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. You are asking for 10 draws. If there is no repetition allowed, you must get each number exactly once. That means the only possible outcomes are the 10! different permutations of {1,2,3,4,5,6,7,8,9,10}.
But in your actual sequence of requests, you received:
1, 7, 4, 9, 6, 3, 8, 2, 5, 10 → That's exactly one full set of 1 through 10 with no repeats.
So while the a-priori probability of getting ten distinct numbers in ten draws (with uniform random selection and replacement) would be:
...in this specific conversation history, it did happen. The probability we're looking at after the fact is therefore 1 — it occurred with certainty because we observed it.
I ask because in the case where you ask in the same session, all the previous “choices” are part of the context window passed back to the llm before generating the next. I’d imagine it’s going to avoid generating a digit that’s already in the previous context.
The rich and ultra rich tend to borrow against their investments rather than spending their own money. As long as the number continues to go up it's basically a free money pipe for everyone over a certain level of wealth.
Does the last part of your comment explain it? They need revenue to expand capacity and the market has opened up a window to become a bigger supplier while still being profitable.
The DoDs recent beef with Anthropic over their right to restrict how Claude can be used is revealing.
> Though Anthropic has maintained that it does not and will not allow its AI systems to be directly used in lethal autonomous weapons or for domestic surveillance
Autonomous AI weapons is one of the things the DoD appears to be pursuing. So bring back the Skynet people, because that’s where we apparently are.
hasn't Ukraine already proved out autonomous weapons on the battlefield? There was a NYT podcast a couple years ago where the interviewed higher up in the Ukraine military and they said it's already in place with fpv drones, loitering, target identification, attack, the whole 9 yards.
You don't need an LLM to do autonomous weapons, a modern Tomahawk cruise missile is pretty autonomous. The only change to a modern tomahawk would be adding parameters of what the target looks like and tasking the missile with identifying a target. The missile pretty much does everything else already ( flying, routing, etc ).
As I remember it the basic idea is that the new generation of drones is piloted close enough to targets and then the AI takes over for "the last mile". This gets around jamming, which otherwise would make it hard for dones to connect with their targets.
Grab yolo, tuned for people detection. Grab any of the off the shelf facial recognition libraries. You can mostly run this on phone hardware, and if you're stripping out the radios then possibly for days.
The shim you have to write: software to fly the drone into the person... and thats probably around somewhere out there as well.
> software to fly the drone into the person... and thats probably around somewhere out there as well.
ardupilot + waypoint nav would do it for fixed locations. The camera identifies a target, gets the gps cooridnates and sets a waypoint. I would be shocked if there wasn't extensions available (maybe not officially) for flying to a "moving location". I'm in the high power rocketry hobby and the knowledge to add control surfaces and processing to autonomously fly a rocket to a location is plenty available. No one does it because it's a bad look for a hobby that already raises eyebrows.
Sounds very interesting, but may I ask how this actually works as a hobby? Is it purely theoretical like analyzing and modeling, or do you build real rockets?
Build and fly. It’s interesting because it attracts a lot of engineers. So you have groups who are experts in propulsion that make their own solid (and now liquid bi-prop) motors. You also have groups that focus on electronics and make flight controllers, gps trackers etc. then you have software people who make build/fly simulators and things like OpenRocket. There’s regional and national events that are sort of like festivals. Some have FAA waivers to fly to around 50k ft. There’s one at Blackrock Nevada where you can fly to space if you want. A handful of amateurs have made it to the karman line too.
Didn't screamers evolve sophisticated intelligence? Is that what happens if we use claw and let it write its own skills and update it's own objectives?
I don’t disagree, but they should be. Last I knew, the government doesn’t control the means of production… and the current US regime loves to boast about it. Confusing right?
When AI dooms humanity it probably won't be because of the sort of malignant misalignment people worry about, but rather just some silly logic blunder combined with the system being directly in control of something it shouldn't have been given control over.
I think we have less to worry about from a future SkyNet-like AGI system than we do just a modern or near future LLM with all of its limitations making a very bad oopsie with significant real-world consequences because it was allowed to control a system capable of real-world damage.
I would have probably worried about this situation less in times past when I believed there were adults making these decisions and the "Secretary of War" of the US wasn't someone known primarily as an ego-driven TV host with a drinking problem.
Self awareness is silly, but the capacity for a powerful minority to oppress a sizeable population without recruiting human soldiers might not be that far off.
I’m not so sure. In terms of total revenue, yes probably insignificant. But in the world of subscriptions and a highly speculative market, I think declining subs can have an outsized impact on share prices.
There are a great many views about this depending on who you talk to. In Christian circles, it’s essentially the infallibility vs inerrancy topic, with fundamentalist denominations leaning toward inerrancy (which is the view that original manuscripts have complete historical accuracy).
Obviously, you have to take a strong “religion first” lens to everything about the world from there.
But of course, there were ancient cultures that pre-date Judaism (and by extension Judeo-Christian sources), which share many similar stories but with different details and descriptions. Large scale flood myths and arks are common in history. You can read the Mesopotamian version in the Epic of Gilgamesh, which is strikingly similar to Noah’s ark.
Yes, the main churches can only stick to the traditional interpretation. What else could they do? Anything else would be pretty much well, blasphemy.
But I think my favourite interpretation that I've heard so far is that the stories in the Bible are like the protective husk that preserves the kernel of truth. The stories are catchy and have stuck, unwittingly allowing the truth to be carried across the centuries, safely hidden in the minds of men who did not understand it, until the day comes when people grow up enough, to the point where they could crack the shell and eat the fruit.
I really like how that sounds like, but of course, there are probably not many others who see it in that light. Luckily for me, these days they don't burn heretics any more (at least where I live :)).
If you read the Bible, there is no way to come to that conclusion. The Bible takes itself incredibly seriously; so to say that
> The stories are catchy and have stuck, unwittingly allowing the truth to be carried across the centuries, safely hidden in the minds of men who did not understand it, until the day comes when people grow up enough, to the point where they could crack the shell and eat the fruit.
is to betray just a general lack of understanding of the text. Just because you're exposed to the stories doesn't mean you understand the stories; the truth of the stories; or it's real intended meaning. It takes really smart people a lot of time and a lot of effort to just begin understanding the breadth and depth of the Bible. It's deeply humbling to begin to unravel it and see the story for how it portrays itself. I would really encourage you to take one story from the Bible, for example, the garden of Eden and see how it traces itself throughout the entire scope of the Bible and the different forms and iconography that shows up just from that one story.
You present the Bible as one text composed at one time, but I’ve never known anyone to take that view. The Bible can’t “take itself incredibly seriously” because it spans millennia in time, including at least a hundred years after Jesus. Hundreds of years after that is when “The Bible” as we know it today was even assembled from pieces during unification. Before that, early Christians had hundreds of religious texts and through a process of negotiable, brought them together under the Roman state. I’m sure if you read something like the infancy gospels which are not included in the Bible, you could probably also find similar themes.
Of course the stories remained culturally relevant through oral traditions and Jewish law. The common thread is culture and the stories of a people.
It doesn't seem terribly common yet though. I think it is challenging to keep it stable.
[1] https://www.opencompute.org/blog/amd-arm-intel-meta-microsof...
[2] https://www.opencompute.org/documents/ocp-microscaling-forma...
reply