Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
embedding-shape
47 days ago
|
parent
|
context
|
favorite
| on:
Kimi Linear: An Expressive, Efficient Attention Ar...
How many tok/s are you getting (with any runtime) with either the Kimi-Linear-Instruct or Kimi-Linear-Base on your RTX 3070?
samus
47 days ago
[–]
With a Qwen3-32B-A3B (Q8) I'm getting 10-20 t/sec on KoboldAI, e.g., llama cpp. Faster than I can read, so good enough for hobby use. I expect this model to be significantly faster, but llama.cpp-based software probably doesn't support it yet.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: