Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can't wait for Artificial analysis benchmarks, still waiting on them adding Qwen3-max thinking, will be interesting to see how these two compare to each other



Wow, these numbers are insanse! I tried it yesterday and it worked beautifully well. It also responded the way I wanted every time, I didn't have to spend time prompting it on how to respond properly (unlike Grok 4 expert, which tends to yap a lot), it just knew.

Todays models have gotten so good that at this point, whatever I run, just works and helps me in whatever. Maybe I should start noting down prompts that some models fails at.


Qwen 3 max has been getting rather bad reviews around the web (both on reddit and chinese social media), and from my own experience with it. So I wouldn't expect this to be worse.


Also, my experience with it wasn't that good; but it was looking good on benchmarks ..

It seems benchmark maxing, what you do when you're out of tricks?


Ohhh, so Qwen3 235B-A22B-2507 is still better?


I wouldn't say that, but just that qwen 3 max thinking definitely underperforms relative to its size.


Did the ArtificialAnalysis team get bored or something? What makes a model worthy of benchmark inclusion?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: