> *L3 cache is not built for mass throughput in the same way that DRAM is, and s...

> L3 cache is not built for mass throughput in the same way that DRAM is, and so it has roughly identical mass throughput despite its much closer distance to the computation.

"The von Neumann bottleneck is impeding AI computing?" (2025) https://news.ycombinator.com/item?id=45398473 :

> How does Cerebras WSE-3 with 44GB of 'L2' on-chip SRAM compare to Google's TPUs, Tesla's TPUs, NorthPole, Groq LPU, Tenstorrent's, and AMD's NPU designs?

From https://news.ycombinator.com/item?id=42875728 :

> WSE-3: 21 PB/S

From https://hackernoon.com/nvidias-mega-machine-crushes-all-of-2... :

> At Computex 2025, Nvidia’s Jensen Huang dropped a bombshell: the NVLink Spine, a compute beast pumping 130 terabytes per second, eclipsing the internet’s 2024 peak of 112.5 TB/s.

"A Comparison of the Cerebras Wafer-Scale Integration Technology with Nvidia GPU-based Systems for Artificial Intelligence" (2025-03) https://arxiv.org/abs/2503.11698v1