> L3 cache is not built for mass throughput in the same way that DRAM is, and so it has roughly identical mass throughput despite its much closer distance to the computation.
> How does Cerebras WSE-3 with 44GB of 'L2' on-chip SRAM compare to Google's TPUs, Tesla's TPUs, NorthPole, Groq LPU, Tenstorrent's, and AMD's NPU designs?
> At Computex 2025, Nvidia’s Jensen Huang dropped a bombshell: the NVLink Spine, a compute beast pumping 130 terabytes per second, eclipsing the internet’s 2024 peak of 112.5 TB/s.
"A Comparison of the Cerebras Wafer-Scale Integration Technology with Nvidia GPU-based Systems for Artificial Intelligence" (2025-03)
https://arxiv.org/abs/2503.11698v1
"The von Neumann bottleneck is impeding AI computing?" (2025) https://news.ycombinator.com/item?id=45398473 :
> How does Cerebras WSE-3 with 44GB of 'L2' on-chip SRAM compare to Google's TPUs, Tesla's TPUs, NorthPole, Groq LPU, Tenstorrent's, and AMD's NPU designs?
From https://news.ycombinator.com/item?id=42875728 :
> WSE-3: 21 PB/S
From https://hackernoon.com/nvidias-mega-machine-crushes-all-of-2... :
> At Computex 2025, Nvidia’s Jensen Huang dropped a bombshell: the NVLink Spine, a compute beast pumping 130 terabytes per second, eclipsing the internet’s 2024 peak of 112.5 TB/s.
"A Comparison of the Cerebras Wafer-Scale Integration Technology with Nvidia GPU-based Systems for Artificial Intelligence" (2025-03) https://arxiv.org/abs/2503.11698v1