You are right; the advanced in DeepSeek-R1 used RL almost solely because of the ...

		hodapp 10 months ago \| parent \| context \| favorite \| on: Why LLMs still have problems with OCR You are right; the advanced in DeepSeek-R1 used RL almost solely because of the chain-of-thought sequences they were generating and training it on.