More

ElijahLynn · 2026-02-12T23:37:00 1770939420

Wow, I wish we could post pictures to HN. That chip is HUGE!!!!

The WSE-3 is the largest AI chip ever built, measuring 46,255 mm² and containing 4 trillion transistors. It delivers 125 petaflops of AI compute through 900,000 AI-optimized cores — 19× more transistors and 28× more compute than the NVIDIA B200.

From https://www.cerebras.ai/chip:

https://cdn.sanity.io/images/e4qjo92p/production/78c94c67be9...

https://cdn.sanity.io/images/e4qjo92p/production/f552d23b565...

dotancohen · 2026-02-13T05:51:12 1770961872

  > 46,255 mm²

To be clear: that's the thousandths separator, not the Nordic decimal. It's the size of a cat, not the size of a thumbnail.

ash_091 · 2026-02-13T08:57:20 1770973040

*thousands, not thousandths, right?

The correct number is fourty six thousand, two hundred and fifty five square mm.

tshaddox · 2026-02-13T14:24:03 1770992643

Whoa, I just realized that the comma is the thousands separator when writing out the number in English words.

c0balt · 2026-02-13T12:15:02 1770984902

or about 0.23 furlongs

fifilura · 2026-02-13T12:39:11 1770986351

More like one thousanth percent of an acre.

(wow, that is not much?)

Sharparam · 2026-02-13T12:44:35 1770986675

This is why space is the only acceptable thousands/grouping separator (a non-breaking space when possible). Avoids any confusion.

xyproto · 2026-02-13T13:30:32 1770989432

Space is also confusing! Then it looks like two separate numbers.

Underscore (_) is already used as a decimal separator in programming languages and Mathematics should just adopt it, IMO.

awakeasleep · 2026-02-13T13:49:42 1770990582

A competing format to simplify things?

bhaak · 2026-02-13T15:36:43 1770997003

A competing format that is understandable to probably everybody.

An ISO 8601 date is also comprehensible to anybody even if they never seen it before and have to figure it out themselves.

fifilura · 2026-02-13T16:11:01 1770999061

Comprehensible?

https://ijmacd.github.io/rfc3339-iso8601/

bhaak · 2026-02-13T21:48:13 1771019293

I said date. Not time, not timestamp, not period, not week, not range, not ordinal date.

Just date.

ultratalk · 2026-02-13T15:03:28 1770995008

https://xkcd.com/927/

wvbdmp · 2026-02-13T14:02:05 1770991325

You mean thousands separator, yes? Agreed, it’s annoying when languages don’t have this feature.

wongarsu · 2026-02-13T15:44:31 1770997471

A thin non-breaking space also clears up the confusion without the visual clutter of the underscore. It's just inconvenient to use

fcanesin · 2026-02-13T12:57:26 1770987446

46_255

mulmen · 2026-02-13T19:23:18 1771010598

The problem is our primitive text representation online. The formatting should be localized but there’s not a number type I can easily insert inline in a text box.

shwetanshu21 · 2026-02-13T05:57:02 1770962222

Thanks, I was acutally wondering how would someone even manage to make that big a chip.

jmalicki · 2026-02-13T14:56:28 1770994588

It's a whole wafer. Basically all chips are made on wafers that big, but normally it's a lot of different chips, you cut the wafer into small chips and throw the bad ones away.

Cerebras has other ways of marking the defects so they don't affect things.

codyb · 2026-02-13T04:35:05 1770957305

Wow, I'm staggered, thanks for sharing

I was under the impression that often times chip manufacture at the top of the lines failed to be manufactured perfectly to spec and those with say, a core that was a bit under spec or which were missing a core would be down clocked or whatever and sold as the next in line chip.

Is that not a thing anymore? Or would a chip like this maybe be so specialized that you'd use say a generation earners transistor width and thus have more certainty of a successful cast?

Or does a chip this size just naturally ebb around 900,000 cores and that's not always the exact count?

20kwh! Wow! 900,000 cores. 125 teraflops of compute. Very neat

fulafel · 2026-02-13T05:51:30 1770961890

Designing to tolerate the defects is well trodden territory. You just expect some rate of defects and have a way of disabling failing blocks.

DeathArrow · 2026-02-13T06:55:09 1770965709

So you shoot for 10% more cores and disable failing cores?

joha4270 · 2026-02-13T07:38:41 1770968321

More or less, yes. Of course, defects are not evenly distributed, so you get a lot of chips with different grades of brokenness. Normally the more broken chips gets sold off as lower tier products. A six core CPU is probably an eight core with two broken cores.

Though in this case, it seems [1] that Cerebras just has so many small cores they can expect a fairly consistent level of broken cores and route around them

[1]: https://www.cerebras.ai/blog/100x-defect-tolerance-how-cereb...

jychang · 2026-02-13T07:35:51 1770968151

Well, it's more like they have 900,000 cores on a WSE and disable whatever ones that don't work.

Seriously, that's literally just what they do.

fulafel · 2026-02-14T07:52:15 1771055535

In their blog post linked in the sibling comment it says the raw number is 970k and they enable 900k (table at the end).

graboy · 2026-02-13T08:22:53 1770970973

IIRC, a lot of design went into making it so that you can disable parts of this chip selectively.

vagrantstreet · 2026-02-13T07:47:48 1770968868

Why is the CEO some shady guy? though https://daloopa.com/blog/analyst-pov/cerebras-ipo-red-flags-...

"AI" always has some sleazy person behind it for some reason

amelius · 2026-02-13T10:19:35 1770977975

You need the sleazy person because you need a shit-ton of money.

vagrantstreet · 2026-02-13T17:13:56 1771002836

I have "shady CEO that doesn't give back to his country or community" fatigue

elorant · 2026-02-13T01:18:04 1770945484

There have been discussions about this chip here in the past. Maybe not that particular one but previous versions of it. The whole server if I remember correctly eats some 20KWs of power.

zozbot234 · 2026-02-13T05:44:20 1770961460

A first-gen Oxide Computer rack puts out max 15 kW of power, and they manage to do that with air cooling. The liquid-cooled AI racks being used today for training and inference workloads almost certainly have far higher power output than that.

(Bringing liquid cooling to the racks likely has to be one of the biggest challenges with this whole new HPC/AI datacenter infrastructure, so the fact that an aircooled rack can just sit in mostly any ordinary facility is a non-trivial advantage.)

mlyle · 2026-02-13T15:23:51 1770996231

> The liquid-cooled AI racks being used today for training and inference workloads almost certainly have far higher power output than that.

75kW is a sane "default baseline" and you can find plenty of deployments at 130kW.

There's talk of pushing to 240kW and beyond...

c0balt · 2026-02-13T12:19:59 1770985199

> Bringing liquid cooling to the racks likely has to be one of the biggest challenges with this whole new HPC/AI

Are you sure about that? HPC has had full rack liquid cooling for a long time now.

The primary challenge with the current generation is the unusual increase of power density in racks. This necessitates upgrades in capacity, notably getting 10-20 kWh of heat away from few Us is generally though but if done can increase density.

jmalicki · 2026-02-13T14:58:36 1770994716

HPC is also not a normal data center but also usually doesn't have the scale of hyperscaler AI data centers either.

elcritch · 2026-02-13T08:10:41 1770970241

Well for some. Google has been using liquid cooling to racks for decades.

dyauspitr · 2026-02-13T05:35:58 1770960958

That’s wild. That’s like running 15 indoor heaters at the same time.

neya · 2026-02-13T02:27:30 1770949650

20KW? Wow. That's a lot of power. Is that figure per hour?

fodkodrasz · 2026-02-13T06:32:02 1770964322

What do you mean by "per hour"?

Watt is a measure of power, that is a rate: Joule/second, [energy/time]

> The watt (symbol: W) is the unit of power or radiant flux in the International System of Units (SI), equal to 1 joule per second or 1 kg⋅m2⋅s−3.[1][2][3] It is used to quantify the rate of energy transfer.

https://en.wikipedia.org/wiki/Watt

ai-christianson · 2026-02-13T02:38:29 1770950309

If you run it for an hour, yes.

taneq · 2026-02-13T12:41:56 1770986516

Ah yes, like those EV chargers that are rated at X kWh/hour.

t0mas88 · 2026-02-13T17:04:56 1771002296

You would hope that an EV reporting x kWh/hour considers the charge curve when charging for an hour. Then it makes sense to report that instead of the peak kW rating. But reality is that they just report the peak kW rating as the "kWh/hour" :-(

neya · 2026-02-13T02:40:23 1770950423

I asked because that's the average power consumption of an average household in the US per day. So, if that figure is per hour, that's equivalent to one household worth of power consumption per hour...which is a lot.

mediaman · 2026-02-13T04:10:26 1770955826

Others clarified the kW versus kWh, but to re-visit the comparison to a household:

One household uses about 30 kWh per day.

20 kW * 24 = 480 kWh per day for the server.

So you're looking at one server (if parent's 20kW number is accurate - I see other sources saying even 25kW) consuming 16 households worth of energy.

For comparison, a hair dryer uses around 1.5 kW of energy, which is just below the rating for most US home electrical circuits. This is something like 13 hair dryers going on full blast.

mycall · 2026-02-13T14:46:54 1770994014

At least with GPT-5.3-Codex-Spark, I gather most of the AI inference isn't rendering cat videos but mostly useful work.. so I don't feel tooo bad about 16 households worth of energy.

wongarsu · 2026-02-13T16:03:45 1770998625

To be fair, this is 16 households of electrical energy. The average household uses about as much electrical energy as it uses energy in form of natural gas (or butane or fuel oil, depending on what they use). And then roughly as much gasoline as they use electricity. So really more like 5 households of energy. And that's just your direct energy use, not accounting for all the products including food consumed in the average household.

seaal · 2026-02-13T04:46:50 1770958010

Which honestly doesn't sound that bad given how many users one server is able to serve.

jodrellblank · 2026-02-13T02:46:22 1770950782

Consumption of a house per day is measured in kiloWatt-hours (an amount of power like litres of water), not kiloWatts (a flow of power like 1 litre per second of water).

1 Watt = 1 Joule per second.

neya · 2026-02-13T03:07:57 1770952077

Thanks!

cortesoft · 2026-02-13T03:04:33 1770951873

I think you are confusing KW (kilowatt) with KWH (kilowatt hour).

A KW is a unit of power while a KWH is a unit of energy. Power is a measure of energy transferred in an amount of time, which is why you rate an electronic device’s energy usage using power; it consumes energy over time.

In terms of paying for electricity, you care about the total energy consumed, which is why your electric bill is denominated in KWH, which is the amount of energy used if you use one kilowatt of power for one hour.

neya · 2026-02-13T03:07:34 1770952054

You're right, I absolutely was mixing them both. Thanks for clarifying!

lm28469 · 2026-02-13T13:25:39 1770989139

Acktshually "kW" and "kWh" to be precise

ipython · 2026-02-13T02:32:11 1770949931

It’s 20kW for as long as you can afford the power bill

ddalex · 2026-02-13T06:41:36 1770964896

20 kWh per hour

hugh-avherald · 2026-02-13T00:57:32 1770944252

Maybe I'm silly, but why is this relevant to GPT-5.3-Codex-Spark?

tonyarkles · 2026-02-13T01:00:59 1770944459

It’s the chip they’re apparently running the model on.

> Codex-Spark runs on Cerebras’ Wafer Scale Engine 3 (opens in a new window)—a purpose-built AI accelerator for high-speed inference giving Codex a latency-first serving tier. We partnered with Cerebras to add this low-latency path to the same production serving stack as the rest of our fleet, so it works seamlessly across Codex and sets us up to support future models.

https://www.cerebras.ai/chip

thunderbird120 · 2026-02-13T01:01:23 1770944483

That's what it's running on. It's optimized for very high throughput using Cerebras' hardware which is uniquely capable of running LLMs at very, very high speeds.

lanthissa · 2026-02-13T11:02:28 1770980548

for cerbras, can we call them chips? you're no longer breaking the wafer we should call them slabs

colordrops · 2026-02-13T13:26:15 1770989175

Macrochips

amelius · 2026-02-13T11:22:35 1770981755

They're still slices of a silicon ingot.

Just like potato chips are slices from a potato.

kumarvvr · 2026-02-13T01:19:26 1770945566

Is this actually beneficial than, say having a bunch of smaller ones communicating on a bus? Apart from space constraints that is.

zamadatix · 2026-02-13T01:38:28 1770946708

It's a single wafer, not a single compute core. A familiar equivalent might be putting 192 cores in a single Epyc CPU (or, more to be more technically accurate, the group of cores in a single CCD) rather than trying to interconnect 192 separate single core CPUs externally with each other.

santaboom · 2026-02-13T01:38:05 1770946685

Yes, bandwidth within a chip is much higher than on a bus.

larodi · 2026-02-13T07:15:18 1770966918

Is all of it one chip? Seems like a waffer with several at least?

txyx303 · 2026-02-13T07:23:25 1770967405

Those are scribe lines where you usually would cut out chips which is why it resembles multiple chips. However, they work with TSMC to etch across them.

xnx · 2026-02-13T22:55:28 1771023328

Bigger != Better

DeathArrow · 2026-02-13T06:53:15 1770965595

>Wow, I wish we could post pictures to HN. That chip is HUGE!!!!

Using a waffer sized chip doesn't sound great from a cost perspective when compared to using many smaller chips for inference. Yield will be much lower and prices higher.

Nevertheless, the actual price might not be very high if Cerebras doesn't apply an Nvidia level tax.

energy123 · 2026-02-13T07:01:38 1770966098

> Yield will be much lower and prices higher.

That's an intentional trade-off in the name of latency. We're going to see a further bifurcation in inference use-cases in the next 12 months. I'm expecting this distinction to become prominent:

(A) Massively parallel (optimize for token/$)

(B) Serial low latency (optimize for token/s).

Users will switch between A and B depending on need.

Examples of (A):

- "Search this 1M line codebase for DRY violations subject to $spec."

An example of (B):

- "Diagnose this one specific bug."

- "Apply this diff".

(B) is used in funnels to unblock (A). (A) is optimized for cost and bandwidth, (B) is optimized for latency.

magicalhippo · 2026-02-13T07:05:28 1770966328

As I understand it the chip consists of a huge number of processing units, with a mesh network between them so to speak, and they can tolerate disabling a number of units by routing around them.

Speed will suffer, but it's not like a stuck pixel on an 8k display rendering the whole panel useless (to consumers).

Heathcorp · 2026-02-13T07:32:06 1770967926

Cerebras addresses this in a blog post: https://www.cerebras.ai/blog/100x-defect-tolerance-how-cereb...

Basically they use very small cores compared to competitors, so faults only affect small areas.

kreelman · 2026-02-12T23:54:40 1770940480

Wooshka.

I hope they've got good heat sinks... and I hope they've plugged into renewable energy feeds...

thrance · 2026-02-13T01:05:38 1770944738

Fresh water and gas turbines, I'm afraid...

King-Aaron · 2026-02-13T00:28:43 1770942523

Nope! It's gas turbines

atonse · 2026-02-13T15:29:56 1770996596

For now. And also largely because it's easier to get that up and running than the alternative.

Eventually, as we ramp up on domestic solar production, (and even if we get rid of solar tariffs for a short period of time maybe?), the numbers will make them switch to renewable energy.

odiroot · 2026-02-13T14:08:31 1770991711

I can imagine how terribly bad their yield must be. One little mistake and the whole "chip" is a goner.

tshaddox · 2026-02-13T14:19:53 1770992393

They have a blog post called "100x Defect Tolerance: How Cerebras Solved the Yield Problem":

https://www.cerebras.ai/blog/100x-defect-tolerance-how-cereb...

svnt · 2026-02-13T15:35:26 1770996926

One thing is really bothering me: they show these tiny cores, but in the wafer comparison image, they chose to show their full chip as a square bounded by the circle of the wafer. Even the prior GPU arch tiled into the arcs of the circle. What gives?

ElijahLynn · 2026-02-09T21:25:16 1770672316

How should advertising work in an AI product? Asad Awan, one of the ad leads at OpenAI, walks through how the company is approaching this decision and why it’s testing ads in ChatGPT at all. He explains how ads are built to stay separate from the model response, keep conversations with ChatGPT private from advertisers, and give people control over their experience.

ElijahLynn · 2026-02-02T19:54:23 1770062063

I read a quote from somebody in the industry recently that stuck. I don't remember who it was.

"Writing software is easy, changing it is hard."

ryandvm · 2026-02-02T19:59:31 1770062371

Absolutely true. Especially so with poorly abstracted software design.

This is why so many new teams' first order of business is invariably a suggestion to "rewrite everything".

They're not going to do a better job or get a better product, it's just the only way they're going to get a software stack that does what they want.

ElijahLynn · 2026-01-26T22:18:49 1769465929

I've stopped using Twitter too. I don't even browse it.

ElijahLynn · 2026-01-21T20:11:23 1769026283

Yup, Time.

I'm confident I can do anything with enough time. But I only have so much.

AI is going to enable so many more ideas to come to fruition and a better world because of it!

ElijahLynn · 2026-01-21T18:46:08 1769021168

Wow, I saw them mentioned in the article, as I was skimming it. But didn't realize that's what was going on!

That's a pretty great model!

ElijahLynn · 2026-01-21T18:24:38 1769019878

RIP SETI@home!!!

I remember setting up SETI on my first computer on dial-up when I was 17 years old! It was such an exciting thing to participate in!

ElijahLynn · 2026-01-14T23:46:59 1768434419

Login wall

ElijahLynn · 2026-01-14T18:28:19 1768415299

https://www.ElijahLynn.net

ElijahLynn · 2026-01-14T18:14:11 1768414451

The Cybertruck also does the tightest turns because it has front and back wheel steering. I could imagine that to be useful on job sites.

nearlyepic · 2026-01-14T18:17:41 1768414661

The kinds of people buying cybertrucks aren't going to be caught dead on a job site.

revnode · 2026-01-14T18:24:39 1768415079

That's not true. Boss likes being flashy. You won't see them being used for actual work, but that's a different proposition.

throwaway-11-1 · 2026-01-14T18:18:59 1768414739

2002 GMC Sierras did this, it was called quadrasteer

FireBeyond · 2026-01-15T20:02:19 1768507339

There is a certain subset of Tesla owners who have this belief that features in certain Tesla vehicles are completely novel to Teslas and other auto manufacturers haven't even considered them. They can often be identified by how they refer to them as "dinosaurs".

Adjustable ride height? Miraculous. Meanwhile my car is mapping the road surface, actively leaning into corners and following road camber, actively avoiding potholes, and adjusting the suspension, including ride height, constantly.

Traffic Sign Recognition, including recognizing school zones, and recognizing active school zones.

Adaptive blind spot - so nice. Speed differential low, or you're going faster? Will not activate, or only activate last moment. But if someone is blowing by you in the HOV lane, it will warn of them when they're still several hundred feet back.

Laser headlights. Matrix headlights. Night vision with thermal imaging.

Predictive active suspension - The car actively scans the road ahead with sensors and it will adjust suspension for poorer road conditions.

The car can not just stop, but will actively swerve, if safe, around obstructions to avoid a collision, or even a parked car opening a door into traffic.

usefulcat · 2026-01-14T19:15:22 1768418122

As did some models of Honda Prelude starting in '87.

throwaway-11-1 · 2026-01-14T21:20:15 1768425615

whoa I was not aware of this, super cool

ErroneousBosh · 2026-01-14T20:09:52 1768421392

A full-size Ford Transit - which is much larger than a Cybertruck, and much more useful - turns in about an 11-metre kerb-to-kerb circle.

That's fully a metre and a half tighter than the Cybertruck.

AngryData · 2026-01-14T21:56:14 1768427774

In my opinion it isn't useful at all because if the only thing you can get into a spot is a vehicle with 4-wheel steering, you have already fucked up your site planning. You aren't going to be delivering materials with that thing, bulk materials are too heavy and light materials are too large. Maybe tools, but it isn't that large to be a tool truck and too expensive for small handyman type work.

ElijahLynn · 2026-01-14T23:50:33 1768434633

There are many situations that are not proper job sites. All sorts of rural situations that require turning.

olyjohn · 2026-01-14T19:33:27 1768419207

No it doesn't. A regular Suburban without 4 wheel steering still has a tighter turning radius. A fucking Suburban!

karlgkk · 2026-01-14T18:18:51 1768414731

Not really, sites are pretty much always spaced out. Ironically, it’s best for city and daily driving - it’s a pure luxury feature.

jandrese · 2026-01-14T18:34:33 1768415673

It would be amazing in the city if it weren't two lanes wide.

sejje · 2026-01-14T19:46:36 1768419996

It's the same width as an F150

306bobby · 2026-01-15T14:50:45 1768488645

Tbf F150s also suck in the city lol

olyjohn · 2026-01-14T19:35:15 1768419315

Its not amazing in the city. The turning radius on the cybertruck is atrocious. Go look it up and quit believing the marketing bullshit.

simondotau · 2026-01-15T00:35:31 1768437331

https://www.youtube.com/watch?v=NjIPEtegPt4&t=2780s

ElijahLynn · 2026-01-21T18:48:35 1769021315

Thanks for posting that, I watched a couple minutes of it and it suggests that the cybertruck has a really good turning radius, and it was able to drive on a go-kart track.

scottyah · 2026-01-14T20:38:29 1768423109

It's great, maybe stop looking it up and go drive one?

olyjohn · 2026-01-15T01:44:49 1768441489

A Suburban without 4 wheel steering has a tighter turning radius. It's pathetic that something with 4 wheel steering can't outdo a Suburban.

And why would I want to drive one? I have literally no reason to drive that waste of batteries.

scottyah · 2026-01-20T23:00:37 1768950037

You should want to drive one so you don't have such a bad world view. Where are you getting your data for the cybertruck turning radius?