You are spot on. I use temperature 0, but even with it, ChatGPT can be unpredict...

You are spot on. I use temperature 0, but even with it, ChatGPT can be unpredictable.

Sometimes I'm very frustrated about the news that get to the top. When I try to debug it, it gives me a completely different score.

I considered using ranges over discrete score, but dropped the idea, as it makes it too hard to find 1-5 articles that should make it to newsletter (there are 71 articles in this range right now) and it's hard to clearly display that idea in UI.

I guess my position right now is — it's not perfect, there are obvious errors (like the one you found above), and improvements are definitely possible.

But I hope that some people would find it "good enough" even with these inconsistencies. I also hope that ChatGPT or another LLM will make a big progress soon that would solve this problem automatically.