Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How are you going to release an LLM eval paper in mid-2025 using

ChatGPT 3.5

Yes, if you are wondering why they don't clarify the model, it because all this was done back in early 2023 (the chat logs are dated). Back then it was only 3.5 and 4 was just freshly released.

Advancement in this space has been so rapid that this is almost like releasing a paper today titled "Video streaming on Mobile Devices" and only using a 3G connection in 2013.

The authors should have held back a few more months and turned the paper into a 3.5 to O3 or any other 2025 SOTA improvement analysis.



The paper was originally released in April 2023, it just got version-bumped a couple months ago :-)


>The authors should have held back a few more months and turned the paper into a 3.5 to O3 or any other 2025 SOTA improvement analysis.

If they had done that, you would then be complaining about them not using Claude or whatever.


I don't see the logic in your comment.


The paper was published in April 2023 (not 2025), but your point about using outdated models stands - evaluating with ChatGPT 3.5 when we now have Claude 3, GPT-4o, and other SOTA models significantly limits the paper's relevance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: