Perhaps GPT-like models are already capable enough to do math, but they need to ...

Perhaps GPT-like models are already capable enough to do math, but they need to store what we call mathematical reasoning as one of many distinct processing pathway and tap into it whenever the context is appropriate.

Easy to say obviously but there's some promising work in this direction[1]

[1]Tracr: Compiled Transformers as a Laboratory for Interpretability, https://arxiv.org/pdf/2301.05062.pdf