Except that we've seen that bigger models don't really scale in accuracy/intelli...

Except that we've seen that bigger models don't really scale in accuracy/intelligence well, just look at GPT4.5. Intelligence scales logarithmically with parameter count, the extra parameters are mostly good for baking in more knowledge so you don't need to RAG everything.

Additionally, you can use reasoning model thinking with non-reasoning models to improve output, so I wouldn't be surprised if the common pattern was routing hard queries to reasoning models to solve at a high level, then routing the solution plan to a smaller on device model for faster inference.