You, know, it sure does add some additional perspective to the original Anthropic marketing materia... ahem, I mean article, to learn that the CCC compiled runtime for SQLite could potentially run up to 158,000 times slower than a GCC compiled one...
Nevertheless, the victories continue to be closer to home.
Indeed, and the difference could in essence be achieved yourself with a different system prompt on 4o. What exactly is 4.5 contributing here in terms of a more nuanced intelligence?
The new RLHF direction (heavily amplified through scaling synthetic training tokens) seems to clobber any minor gains the improved base internet prediction gains might've added.
Forgive me if I'm missing your existing realization (I did a quick check of your HN, reddit, twitter, LW), but I think the big deal with Sohu (wrt Etched) is that they have pivoted from the "all model parameters hard etched onto the chip" to "only transformer(matmul etc) ops etched onto the chip".
Soho does not have the LLaMA 70b weights directly lithographed onto the silicon, as you seem? to be implying with attachment to that 6month old post.
Seems like a sensible pivot; I'd imagine they're rather up to date on the pulse of dynamically updated nets potentially being a major feature in upcoming frontier models, as you've recently been commentating on. However, I'm not deep enough in it to be sure how much this removes their differentiation vs other AI accelerator startups.
Luckily Eliezer has written hundreds of approachable essays on the development of his epistemic processes over at lesswrong.com so you too can learn rationality and derive the killeveryonism conclusion yourself.
You are conflating Illya's belief in the transformer architecture (with tweaks/compute optimizations) being sufficient for AGI with that of LLMs being sufficient to express human-like intelligence. Multi-modality (and the swath of new training data it unlocks) is clearly a key component of creating AGI if we watch Sutskever's interviews from the past year.
Yes, I read "Attention Is All You Need", and I understand that the multi-head generative pre-trained model talks about "tokens" rather than language specifically. So in this case, I'm using "LLM" as shorthand for what OpenAI is doing with GPTs. I'll try to be more precise in the future.
That still leaves disagreement between Altman and Sutskever over whether or not the current technology will lead to AGI or "superintelligence", with Altman clearly turning towards skepticism.
The issue, as pointed above, is primarily bandwidth (at inference), not addressable memory. Put simply, the best bandwidth stack we currently have is on-package HBM -> NVLink, -> Mellanox InfiniBand, and for inference speed you really can't leave the NVLink bandwidth (read, 8x DGX pod) for >100b parameters. And stacking HBM dies is much harder (read, expensive) than GDDR dies which is harder than DDR etc.
Cost aside, HMB dies themselves aren't getting significantly denser anytime soon, and there just simply isn't enough package space with current manufacturing methods to pack a significantly increased number of dies on the gpu.
So I suspect the major hardware jumps will continue to be with NVLink/NVSwitch. Nvlink 4 + NVSwitch 3 actually already allows for up 256x GPUs https://resources.nvidia.com/en-us-grace-cpu/nvidia-grace-ho... ; increased numbers of links will let ever increasing numbers of GPUs pool with sufficient bandwidth for inference on larger models.
As already mentioned, see this HN post about the GH200 https://news.ycombinator.com/item?id=36133226, which has some further discussion about the cutting edge of bandwidth for Nvidia DGX and Google TPU pods.
Nevertheless, the victories continue to be closer to home.