Can't wait for Artificial analysis benchmarks, still waiting on them adding Qwen...

huey77 · 2025-11-08T06:27:01 1762583221

The analysis is up! Impressive: https://artificialanalysis.ai/models/kimi-k2-thinking

Alifatisk · 2025-11-08T08:53:21 1762592001

Wow, these numbers are insanse! I tried it yesterday and it worked beautifully well. It also responded the way I wanted every time, I didn't have to spend time prompting it on how to respond properly (unlike Grok 4 expert, which tends to yap a lot), it just knew.

Todays models have gotten so good that at this point, whatever I run, just works and helps me in whatever. Maybe I should start noting down prompts that some models fails at.

osti · 2025-11-06T17:01:08 1762448468

Qwen 3 max has been getting rather bad reviews around the web (both on reddit and chinese social media), and from my own experience with it. So I wouldn't expect this to be worse.

SamDc73 · 2025-11-06T17:08:44 1762448924

Also, my experience with it wasn't that good; but it was looking good on benchmarks ..

It seems benchmark maxing, what you do when you're out of tricks?

Alifatisk · 2025-11-06T17:11:00 1762449060

Ohhh, so Qwen3 235B-A22B-2507 is still better?

osti · 2025-11-06T19:39:52 1762457992

I wouldn't say that, but just that qwen 3 max thinking definitely underperforms relative to its size.

htrp · 2025-11-06T17:40:37 1762450837

Did the ArtificialAnalysis team get bored or something? What makes a model worthy of benchmark inclusion?