Tenstorrent Unveils High-End Wormhole AI Processors, Featuring RISC-V

IshKebab · on July 20, 2024

That price is very good - even better is the fact that you can actually buy these right now (ships in 2 weeks apparently) by going to their website! So many AI startups are PoA, only B2B etc.

I did some 5 second digging into their code and it seems based on their Github that they took Berkeley's BOOM CPU (written in Chisel), and added support for the Vector 1.0 extension, and then they've called this core Ocelot. Seems like the core is open source at least.

https://github.com/tenstorrent/riscv-ocelot/blob/bobcat/READ...

Not sure about the driver/software situation though. Do you actually get to run whatever RISC-V code you want on these? Could be amazing if so.

camel-cdr · on July 20, 2024

ocelot/bobcat has nothing to do with the ML cards. It was their first go of implementing the RISC-V vector extension as a proof of concept and they open sourced it. It's not very optimized and the ascalon architecture will be a lot different.

A_D_E_P_T · on July 20, 2024

Yeah, this is top-notch pricing and availability, and the chips look very capable. Tenstorrent is probably going to 10x, and fast.

kranke155 · on July 20, 2024

Jim Keller the Silicon Ronin does it again.

janice1999 · on July 20, 2024

There's some performance numbers here: https://github.com/tenstorrent/tt-metal

qeternity · on July 20, 2024

These numbers are…not great.

inhumantsar · on July 21, 2024

relative to what? according to this, a single card is capable of >300t/s on Mistral-7B and the workstations with 4 cards are doing nearly 500t/s on Mixtral-7Bx8.

yeah the H100 and MI300 nearly 10x those numbers [1] at the same batch sizes, but those cards are unobtainium server-class hardware priced way outside the prosumer range. while these cards cost less than a RTX 4090 and only use ~300W.

what other options exist for individuals or small companies looking to run/train locally at that kind of speed?

1: https://blog.runpod.io/amd-mi300x-vs-nvidia-h100-sxm-perform...

gary_0 · on July 21, 2024

> yeah the H100 and MI300 nearly 10x those numbers

Don't they cost 10x as much, or more?

qeternity · on July 21, 2024

No, from their benchmark setup, it’s more like 3x as much. So yeah it’s just really not competitive.

inhumantsar · on July 21, 2024

I've only ever seen rumors and reports of them being >$10k USD and only available to enterprise customers.

Where are you seeing pricing for the H100 and MI300?

qeternity · on July 21, 2024

Yes but the benchmarks listed above are mostly done on an 8x setup at $1400 each so ca. $12k and the performance achieved is a fraction of what a $30k H100 will do.

aseipp · on July 21, 2024

The benchmarks on GitHub use the N300 which has 2 chips per board with 4 boards in the system -- that's the "2x4" they refer to -- with each board being 1400 USD. So that's only $5.6k to match the system they sell, versus the H100 which is north of $30k. Well, OK, at an equivalent bs=32 it's only 4x or 5x worse than the H100 according to the benchmarks in this thread (500t/s vs ~2000t/s), but as you note that's not what people practically use for batch sizes, and the power usage is a factor of 4x worse overall too. So, at an impractical batch size it only uses 4x more power for about 1/4th the total tokens/second. Given the pricing you could in theory buy 4x as many cards while still being cheaper than an H100, but that totally ignores operational and other scaling costs. Apparently Tenstorrent is still on 12nm for Wormhole, too.

Anyway, I overall agree with you it's not as amazing as people here might make it sound. I think people here are viewing it with rose tints because practically speaking nobody else actually sells B2C accelerator hardware at a reasonable cost, and has actual availability which is what they want. They look at Tenstorrent and see a "Buy now" form as a way to spend $5k USD and get 96GB of GDDR6 and a toolchain that's both open-source and not-nvidia, or whatever. This forum is going to be particularly sensitive to things like that.

The actual hardware I think still has a ways to go, but hopefully they can scale it up in a bunch of ways and people can at least buy functionally usable cards with a software stack that works on them all. So, they're doing better than a lot of competitors in those ways, I guess...

qeternity · on July 22, 2024

> the N300 which has 2 chips per board with 4 boards in the system

Ah my mistake, thanks for that.

But also an H100 will definitely do more than 2000 t/s at bs=32 in fp16

qeternity · on July 21, 2024

That is bs=32 which is wholly unimpressive. Last gen consumer cards can do better than this at similar power envelope. Current gen even better.

inhumantsar · on July 21, 2024

do you have any benchmarks to share? the best I've seen for a 4090 are all in the mid-20s, never more than 30t/s.

qeternity · on July 21, 2024

Where have you seen this? Look at what vLLM or TensorRT will do on a 4090 at those batch sizes.

attentive · on July 21, 2024

for a personal workstation? - those are great.

qeternity · on July 21, 2024

They aren’t. You can throw a 3090 in a workstation and do much better with lower outlay and similar power envelope.

An N300 costs $1400 and does 350 t/s on Mistral 7B at bs=32. For $1000 you can buy a 3090 that will do 10x that.

imtringued · on July 21, 2024

I agree. Their 70B parameter benchmarks are better than 7B. Something is wrong with their software and they need to fix it. They get 4000 token/s on Falcon 7B, which is what you should expect from processing a batch of prompts. The per user performance is atrocious and their hardware should be at least twice as fast as a moderately overclocked DDR5 system.

qeternity · on July 21, 2024

You can do 4k t/s at large batch on last gen consumer gear. You’re right their 70B are more competitive but still very much off the mark. If these cards cost $500 instead of $1400 then maybe it would be more compelling.

Tepix · on July 20, 2024

What's t/s/u? tokens per second per user?

tucnak · on July 20, 2024

per unit

Tepix · on July 20, 2024

The Loudbox costs $6000 and contains 4x n300 cards at $1400 each so on paper you only pay $400 for the box with 2 Xeons, 3.8TB NVMe SSD and 512GB DDR4-3200 ECC RAM. Some buyers of the loudbox might be tempted to sell the four n300 cards separately and keep the box, i'd say it's worth at least $2500 or so.

How fast are the n300s compared to RTX 4090 or RTX 3090 cards?

At these prices it doesn't seem worthwhile to stick them into your own workstation instead of going for theirs.

thrtythreeforty · on July 20, 2024

If you click through to buy, it becomes apparent that unfortunately there is a typo - QuietBox is actually $15000, with a $1500 deposit.

inhumantsar · on July 21, 2024

the last time I priced out an Epyc workstation like that, it was well over $12k CAD.

seems like it's still quite a good deal even at $15k USD

Tepix · on July 20, 2024

I'm talking about the $6000 loudbox, not the quietbox. By the way, an extra $9000 just for water cooling seems... high?

aseipp · on July 20, 2024

The Loudbox is $12k USD, you can see the pricing right above the specs/description: https://tenstorrent.com/hardware/tt-loudbox

rbanffy · on July 20, 2024

It's 12K. $6000 is the deposit.

Tepix · on July 20, 2024

Oh. Bummer.

doctorpangloss · on July 21, 2024

Don't beat yourself up about it. George Hotz thought NVIDIA Inception pricing for Ada RTX 6000s was $1,500, instead of $1,500 discounts, and launched a whole company based on the misunderstanding.

talldayo · on July 20, 2024

> According to Tenstorrent, each Tensix core features "five RISC-V baby cores," which allows scalability along with multi-chip development much more effectively.

A neat feather in RISC-V's cap. Though to be fair, RISC-V microcontrollers on PCI boards is nothing new; Nvidia's shipped their RISC-V Falcon GSP for years, and a lot of recent storage mediums rely on RISC-V microcontrollers.

ein0p · on July 20, 2024

288GB/s is way not enough in 2024. That compute will be mostly idle outside the prompt processing phase.

tucnak · on July 20, 2024

Tenstorrent is a RISC-V psyop confirmed.

ein0p · on July 21, 2024

I’m all for RISC-V regardless, psyop or not.

nickpsecurity · on July 24, 2024

Tenstorrent’s website has different prices for the AI workstations: LoudBox says $12,000, not $6,000; QuietBox says $15,000, not $1,500. Example below.

https://tenstorrent.com/hardware/tt-loudbox

The real comparison for me is against a box using NVIDIA RTX’s. That’s what most use in the sub-$10,000 space for open-source models. They usually use $2,000-$5,000 worth of consumer GPU’s with maybe 24GB VRAM each. So, how does this compare on inference or training to 1-2 RTX’s with 24GB? And what performance for both small and large models?

drmpeg · on July 20, 2024

Ian Cutress video.

https://www.youtube.com/watch?v=Fa2GS86_cx8

AlexeyBrin · on July 21, 2024

I find it a bit strange that they recommend to use Ubuntu 20.04 [0], when 24.04 was recently launched ... I would have expected at least 22.04 support.

[0] https://tenstorrent.com/hardware/tt-quietbox

marty1885 · on July 21, 2024

They are moving to 22.04 and there's a push for 24.04. See https://github.com/tenstorrent/tt-metal/issues/10469

AlexeyBrin · on July 21, 2024

Thanks, this is a good sign. I had a bad experience with Jetson Nano to realize a few years later that it will be stuck forever to Ubuntu 20.04. I don't want to repeat the experience especially with a more expensive device.

Havoc · on July 20, 2024

That's got to be bad pricing info

If a single N300 is $1,399 and a quietbox has four then the quietbox can't be $1,500. Maybe the 1.5 is pricing of box minus cards?

curious_cat_163 · on July 20, 2024

QuietBox is $15,000 not $1,500: https://tenstorrent.com/hardware/tt-quietbox

IshKebab · on July 20, 2024

Yeah you can see on their website that $1500 is a deposit, and the actual price is $15k. Still very competitive. https://tenstorrent.com/hardware/tt-quietbox

K0balt · on July 21, 2024

So it seems like a 4090has 4x the memory bandwidth and 4x the fp8 , for about 2x the price?

I’m probably daft but I’m not understanding the value prop here?

kats · on July 21, 2024

The point is for companies to buy these to start trying it out and developing on it.

K0balt · on July 21, 2024

Ok, makes sense, planning for higher scale densities TBA, and banking on continuing scarcity of nvidia inference engines.

rbanffy · on July 20, 2024

I would really like to explore a workstation with one of these and nothing else.

We need more diversity on the hardware side and see what we can build on it.

dartos · on July 21, 2024

We’re not long before an explosion of GPU like processors from hardware startups trying to dethrone nvidia

Massive parallelization is in right now

rbanffy · on July 21, 2024

I often think what an Amiga for today would look like, something different from the computers we usually have. One with just a GPU-like processor doing everything, exploiting massive software parallelism instead of dedicated chips would fit this description.

dartos · on July 22, 2024

I think you’d still need a CPU.

The way GPUs and CPUs work are so different. I’d be surprised if you could have a usable computer with just GPUs without turning it into a CPU or having it effectively be single threaded.

rbanffy · on July 22, 2024

Kind of. Remember the RPis are brought up by their GPU before the ARM cores become active.