They seem to have another FAQ here that gives a real answer (273GB/s): https://w...

suprjami · 2025-11-10T22:48:04 1762814884

Now we can see why they avoided giving a straight answer.

File this one in the blue folder like the DGX

stogot · 2025-11-11T00:31:48 1762821108

Noob here. Why is that number bad?

TomatoCo · 2025-11-11T01:23:51 1762824231

LLM performance depends on doing a lot of math on a lot of different numbers. For example, if your model has 8 billion parameters, and each parameter is one byte, then for 256gb/s you can't do better than 32 tokens per second. So if you try to load a model that's 80 gigs, you only get 3.2 tokens per second, which is kinda bad for something that costs 3-4k.

There's newer models called "Mixture of Experts" that are, say, 120b parameters, but only use 5b parameters per token (the specific parameters are chosen via a much smaller routing model). That is the kind of model that excels on this machine. Unfortunately again, those models work really well when doing hybrid inference, because the GPU can handle the small-but-computationally-complex fully connected layers while the CPU can handle the large-but-computationally-easy expert layers.

This product doesn't really have a niche for inference. For training and prototyping is another story, but I'm a noob on those topics.

abtinf · 2025-11-11T01:21:50 1762824110

My mac laptop has 400gb/s bandwidth. LLMs are bandwidth bound.

kennethallen · 2025-11-11T00:40:04 1762821604

Running LLMs will be slow and training them is basically out of the question. You can get a Framework Desktop with similar bandwidth for less than a third of the price of this thing (though that isn't NVIDIA).

embedding-shape · 2025-11-11T13:33:50 1762868030

> Running LLMs will be slow and training them is basically out of the question

I think it's the reverse, the use case for these boxes are basically training and fine-tuning, not inference.

kennethallen · 2025-11-12T23:40:21 1762990821

The use case for these boxes is a local NVIDIA development platform before you do your actual training run on your A100 cluster.

NaomiLehman · 2025-11-11T07:03:08 1762844588

refurbished macbooks m1 for $1,500 have more with less latency