Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You are right; the advanced in DeepSeek-R1 used RL almost solely because of the chain-of-thought sequences they were generating and training it on.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: