More

Chamix · 2026-02-09T05:50:36 1770616236

You, know, it sure does add some additional perspective to the original Anthropic marketing materia... ahem, I mean article, to learn that the CCC compiled runtime for SQLite could potentially run up to 158,000 times slower than a GCC compiled one...

Nevertheless, the victories continue to be closer to home.

Chamix · 2025-02-27T22:24:48 1740695088

Indeed, and the difference could in essence be achieved yourself with a different system prompt on 4o. What exactly is 4.5 contributing here in terms of a more nuanced intelligence?

The new RLHF direction (heavily amplified through scaling synthetic training tokens) seems to clobber any minor gains the improved base internet prediction gains might've added.

Chamix · 2025-02-27T22:09:56 1740694196

It's interesting to compare the cost of that original gpt-4 32k(0314) vs gpt-4.5:

$60/M input tokens vs $75/M input tokens

$120/M output tokens vs $150/M output tokens

Chamix · on June 29, 2024

Forgive me if I'm missing your existing realization (I did a quick check of your HN, reddit, twitter, LW), but I think the big deal with Sohu (wrt Etched) is that they have pivoted from the "all model parameters hard etched onto the chip" to "only transformer(matmul etc) ops etched onto the chip".

Soho does not have the LLaMA 70b weights directly lithographed onto the silicon, as you seem? to be implying with attachment to that 6month old post.

Seems like a sensible pivot; I'd imagine they're rather up to date on the pulse of dynamically updated nets potentially being a major feature in upcoming frontier models, as you've recently been commentating on. However, I'm not deep enough in it to be sure how much this removes their differentiation vs other AI accelerator startups.

Chamix · on May 13, 2024

I was thinking about the llm writing tool from Janus.

Chamix · on March 28, 2024

4chan already has a torrent out, of course.

Chamix · on Nov 20, 2023

The little secret is that the training run (meaning, creating the raw autocompleting multimodal token weights) for 5 ran in parallel with 4.

Chamix · on Nov 19, 2023

Luckily Eliezer has written hundreds of approachable essays on the development of his epistemic processes over at lesswrong.com so you too can learn rationality and derive the killeveryonism conclusion yourself.

(/s since this is the internet)

Chamix · on Nov 18, 2023

You are conflating Illya's belief in the transformer architecture (with tweaks/compute optimizations) being sufficient for AGI with that of LLMs being sufficient to express human-like intelligence. Multi-modality (and the swath of new training data it unlocks) is clearly a key component of creating AGI if we watch Sutskever's interviews from the past year.

cratermoon · on Nov 18, 2023

Yes, I read "Attention Is All You Need", and I understand that the multi-head generative pre-trained model talks about "tokens" rather than language specifically. So in this case, I'm using "LLM" as shorthand for what OpenAI is doing with GPTs. I'll try to be more precise in the future.

That still leaves disagreement between Altman and Sutskever over whether or not the current technology will lead to AGI or "superintelligence", with Altman clearly turning towards skepticism.

Chamix · on Nov 19, 2023

Fair enough, shame "Large Tokenized Models" etc never entered the nomenclature.

cratermoon · on Nov 19, 2023

Some terms I've seen used for the technology:

Big-Data Statistical Models

Stochastic Parrots or parrot-tech

plausible sentence generators

glorified auto-complete

cleverbot

"a Blurry JPEG of the Web" <https://www.newyorker.com/tech/annals-of-technology/chatgpt-...>

and just plain ol' "machine learning"

limpanz · on Nov 19, 2023

Do you have a link to one of these talks?

Chamix · on June 21, 2023

The issue, as pointed above, is primarily bandwidth (at inference), not addressable memory. Put simply, the best bandwidth stack we currently have is on-package HBM -> NVLink, -> Mellanox InfiniBand, and for inference speed you really can't leave the NVLink bandwidth (read, 8x DGX pod) for >100b parameters. And stacking HBM dies is much harder (read, expensive) than GDDR dies which is harder than DDR etc.

Cost aside, HMB dies themselves aren't getting significantly denser anytime soon, and there just simply isn't enough package space with current manufacturing methods to pack a significantly increased number of dies on the gpu.

So I suspect the major hardware jumps will continue to be with NVLink/NVSwitch. Nvlink 4 + NVSwitch 3 actually already allows for up 256x GPUs https://resources.nvidia.com/en-us-grace-cpu/nvidia-grace-ho... ; increased numbers of links will let ever increasing numbers of GPUs pool with sufficient bandwidth for inference on larger models.

As already mentioned, see this HN post about the GH200 https://news.ycombinator.com/item?id=36133226, which has some further discussion about the cutting edge of bandwidth for Nvidia DGX and Google TPU pods.

hesdeadjim · on June 23, 2023

Thanks for this info!