Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is so hard! I don't yet have a great solution for this myself, but I've been collecting notes about this on my "evals" tag for a while: https://simonwillison.net/tags/evals/

The best writing I've seen about this is from Hamel Husain - https://hamel.dev/blog/posts/llm-judge/ and https://hamel.dev/blog/posts/evals-faq/ are both excellent.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: