How are you going to release an LLM eval paper in mid-2025 using
ChatGPT 3.5
Yes, if you are wondering why they don't clarify the model, it because all this was done back in early 2023 (the chat logs are dated). Back then it was only 3.5 and 4 was just freshly released.
Advancement in this space has been so rapid that this is almost like releasing a paper today titled "Video streaming on Mobile Devices" and only using a 3G connection in 2013.
The authors should have held back a few more months and turned the paper into a 3.5 to O3 or any other 2025 SOTA improvement analysis.
The paper was published in April 2023 (not 2025), but your point about using outdated models stands - evaluating with ChatGPT 3.5 when we now have Claude 3, GPT-4o, and other SOTA models significantly limits the paper's relevance.
ChatGPT 3.5
Yes, if you are wondering why they don't clarify the model, it because all this was done back in early 2023 (the chat logs are dated). Back then it was only 3.5 and 4 was just freshly released.
Advancement in this space has been so rapid that this is almost like releasing a paper today titled "Video streaming on Mobile Devices" and only using a 3G connection in 2013.
The authors should have held back a few more months and turned the paper into a 3.5 to O3 or any other 2025 SOTA improvement analysis.