Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Cerebras' Wafer Scale Engine is the opposite of small and cheap.

https://cerebras.ai/product-chip/



That's totally nuts. How do they deal with the silicon warping around disabled cores or dark silicon? How long of hard running does it take before the chip gets fatally damaged and needs to be replaced in their system? Word on the street is that h100s fail surprisingly often, this can't be better


https://www.youtube.com/watch?v=7GV_OdqzmIU&t=1104s

This video from Cerebras perfectly explain how they solve the interconnect problem, and why their approach greatly reduces the risk of Blackwell-type hardware design challenges.


Super informative video!


The way it's mounted, there's unlikely to be warping: https://web.archive.org/web/20230812020202/https://www.youtu...

The cooling is significantly better than what you'd see on a server platform with water cooling channels going to each row of the wafer.


They have a way of bypassing bad cores, and over-provision both logic and memory by 1.5% to account for that. They get 100% yields this way.

https://www.anandtech.com/show/16626/cerebras-unveils-wafer-...


Yes,that's exactly what would cause the problems I'm suggesting.


edit: I should have just checked their website instead of guessing. Apparently WSE has significant fabrication challenges, which makes what Cerebras has accomplished all the more impressive. But it is still surprising that no one else has attempted this in the HPC field.

I had guessed that Cerebras had made some trade-offs in process in order to make it work at scale, but then they aren't actually building these devices at scale (yet).


They're using TSMC 5-nm for WSE-3: https://spectrum.ieee.org/cerebras-chip-cs3


Cerebras is known to use TSMC, so your speculation about boutique fab is incorrect.

https://cerebras.ai/press-release/cerebras-systems-smashes-t...


It shouldn't be surprising. It's hard as fuck, and we don't know if it's worth it yet (there's something to be said for "if your compute dies you send your remote hands to swap out a high four figure component" and not "decommission a high-six-figure node"


I hope I didn't make it sound like it was easy, at least I don't think I said that anywhere. It doesn't really matter how hard something is to do (short of it being trivially proven impossible), it matters whether there's a good enough chance that the payoff exceeds the cost.

And actually there have been attempts to do it, I mentioned in an earlier version of my comment that Gene Amdahl had attemped to make WSE work something like 20 years ago, without success - but also without the clear profitability story of AI to attract the same mountains of cash being thrown around today.

What's surprising is not that it is hard, or that it's hard as fuck, but that given the potentially stratospheric rewards for success there have not been more attempts in this direction.


What makes you think you couldn't do as well on something less radical?

IIRC cerebras' design was originally for HPC workloads, so even it may not necessarily be optimized for LLMs


Is this all mostly a heat spreader efficiency requirement?


I suppose the proper term would be heat spreader thermal resistance limit, dictated by the thermal stress sums or whatever the term would be.


I updated my comment to be clearer. I meant smaller and/or cheaper per token. In this case it's cheaper per token.


Maybe the real trend is that huge parameter counts are curiosities.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: