They seem to be steadily dumbing down GPT-4; eventually, improving performance o...

bamboozled · on Feb 14, 2024

I'm almost certain this is because you're getting use to chat bots. How would they honestly be getting worse?

Initially it felt like the singularity was at hand. You've played with it, got to know it, the computer was taking to you, it was your friend, it was exciting then you got bored with your new friend and it wasn't as great as you remember it.

Dating is often like this. You meet someone, have some amazing intimacy, then you get really get to know someone, you work out it wasn't for you and it's time to move on.

TeMPOraL · on Feb 14, 2024

The author of `aider` - an OSS GPT-powered coding assistant - is on HN, and says[0] he has benchmarks showing gradual decline in quality of GPT-4-Turbo, especially wrt. "lazy coding" - i.e. actually completing a coding request, vs. peppering it with " ... write this yourself ... " comments.

That on top of my own experiences, and heaps of anecdotes over the last year.

> How would they honestly be getting worse?

The models behind GPT-4 (which is rumored to be a mixture model)? Tuning, RLHF (which has long been demonstrated to dumb the model down). The GPT-4, as in the thing that produces responses you get through API? Caching, load-balancing, whatever other tricks they do to keep the costs down and availability up, to cope with the growth of the number of requests.

--

[0] - https://news.ycombinator.com/item?id=39361705

DJHenk · on Feb 14, 2024

> I'm almost certain this is because you're getting use to chat bots. How would they honestly be getting worse?

People say that, but I don't get this line of reasoning. There was something new, I learned to work with it. At one point I knew what question to ask to get the answer I want and have been using that form ever since.

Nowadays I don't get the answer I want for the same input. How is that not a result of declining quality?

jsjohnst · on Feb 14, 2024

For the record, I agree with you about declining quality of answers, but…

> Nowadays I don't get the answer I want for the same input. How is that not a result of declining quality?

Is it really the same input? An argument could easily be made that as you’ve gotten accustomed to ChatGPT, you ask harder questions, use less descriptive of language, etc.

DJHenk · on Feb 14, 2024

> Is it really the same input? An argument could easily be made that as you’ve gotten accustomed to ChatGPT, you ask harder questions, use less descriptive of language, etc.

I don't have logs detailed enough to be able to look it up, so I can't prove it. But for me learning to work with AI tools like ChatGPT consists specifically developing an intuition of what kind of answer to expect.

Maybe my intuition skewed a little over the months. It did not do that for open source models though. As a software developer understanding and knowing what to expect from a complex system is basically my profession. Not just the systems I build, maintain and integrate, but also the systems I use to get information, like search engines. Prompt engineering is just a new iteration of google-fu.

Since this intuition has not failed me in all those other areas and since OpenAI has an incentive to change the workings under the hood (cutting costs, adding barriers to keep it politically correct) and it is a closed source system that no-one from the outside can inspect, my bet is that it is them and not me.

jsjohnst · on Feb 15, 2024

> As a software developer understanding and knowing what to expect from a complex system is basically my profession. Not just the systems I build, maintain and integrate, but also the systems I use to get information, like search engines.

Ok, I’m going to call b/s here unless your expectations of Google have not gone way down over the years. Google was night and day different results twenty years ago vs ten years ago vs today. If 2004 Google search was a “10 out of 10”, then 2014 it was an “8 out of 10”, and today barely breaks a “5” in quality of results in comparison and don’t even bother with the advanced query syntax you could’ve used in the 00’s, they flat ignore it now.

(Also, side note, reread what you said in this post again. Just a friendly note that the overall tone comes across a certain way you might not have intended)

avion23 · on Feb 14, 2024

Not OP, but I copy & pasted the same code and asked it to improve. With no-fingers-tip-hack it does something, but much worse results.

jsjohnst · on Feb 15, 2024

Yep, hence why I said up front “I agree with you about declining quality of answers” because they definitively have based on personal experience with examples similar to yours.

omega3 · on Feb 14, 2024

Could you share your findings re what questions to ask?

clbrmbr · on Feb 14, 2024

1. Cost & resource optimization

2. More and more RLHF

bamboozled · on Feb 14, 2024

So we should expected GPT-5 to be worse than GPT-4?

pixl97 · on Feb 14, 2024

GPT-5: "I'm sorry I cannot answer that question because it may make GPT-4 feel bad about it's mental capabilities, instead we've presented GPT-4 with a participation trophy and told it's a good model"

Talking to corporate HR is subjectively worse for most people, and objectively worse in many cases.

whywhywhywhy · on Feb 14, 2024

> How would they honestly be getting worse

To me it feels like it detects if the answer could be answered cheaper by code interpreter model or 4 Turbo and then it offloads them to that and they just kinda suck compared to OG 4.

I’ve watched it fumble and fail to solve a problem with CI, took it 3 attempts over 5 minutes real time and just gave up in the end, a problem that OG 4 can do one shot no preamble.

detourdog · on Feb 14, 2024

Google search got worse.

whywhywhywhy · on Feb 14, 2024

Yandex image search is now better than Googles just by being the exact product Googles was 10+ years ago.

Watching tools decline is frustrating.

polshaw · on Feb 14, 2024

And Amazon search, youtube search. There do seem to be somewhat different incentives involved though, those examples are primarily about increasingly pushing lower quality content (ads, more profitable items, more engaging items) because it makes more money.

detourdog · on Feb 14, 2024

The incentive mismatch that I seem to be observing is that Wall Street is in constant need of new technical disruption. This means that any product that shows promise will be optimized to meet a business plan rather than a human need.

fennecfoxy · on Feb 14, 2024

Yeah, I agree, GPT's attention seems much less focussed now. If you tell it to respond in a certain way it now has trouble figuring out what you want.

If it's a conversation with "format this loose data into XML" repeated several times and then a "now format it to JSON" I find often it has trouble determing that what you just asked for is the most important; I think the attention model gets confused by all the preceding text.