You are spot on. I use temperature 0, but even with it, ChatGPT can be unpredictable.
Sometimes I'm very frustrated about the news that get to the top. When I try to debug it, it gives me a completely different score.
I considered using ranges over discrete score, but dropped the idea, as it makes it too hard to find 1-5 articles that should make it to newsletter (there are 71 articles in this range right now) and it's hard to clearly display that idea in UI.
I guess my position right now is — it's not perfect, there are obvious errors (like the one you found above), and improvements are definitely possible.
But I hope that some people would find it "good enough" even with these inconsistencies. I also hope that ChatGPT or another LLM will make a big progress soon that would solve this problem automatically.
Sometimes I'm very frustrated about the news that get to the top. When I try to debug it, it gives me a completely different score.
I considered using ranges over discrete score, but dropped the idea, as it makes it too hard to find 1-5 articles that should make it to newsletter (there are 71 articles in this range right now) and it's hard to clearly display that idea in UI.
I guess my position right now is — it's not perfect, there are obvious errors (like the one you found above), and improvements are definitely possible.
But I hope that some people would find it "good enough" even with these inconsistencies. I also hope that ChatGPT or another LLM will make a big progress soon that would solve this problem automatically.