Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The performance on Macbook with M1 Pro is said to be 20 tokens/s

https://twitter.com/ggerganov/status/1634282694208114690



A Macbook Pro M1 would have the base M1 CPU while he was referring to the M1 Pro CPU in something like a Macbook Pro w/ M1 Pro. It's confusing naming by Apple.


right, fixed it


This is faster than running it on an RTX 4090 I think.


I get 32 tokens/sec on a 4090 using GPTQ 4bit with streaming off, with the model 5x larger than that.

So nowhere close to the 4090, but plenty fast anyway.


Nope a 4090 can do the 30b-4bit model at 20 tokens/s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: