Had just finished watching the Physics of Language Models[1] talk, where they show how GPT2 models could learn non-trivial context-free grammars, as well as effectively do dynamic programming to an extent, so though it would be interesting to see how they performed in the spectral fine-graining task.
Had just finished watching the Physics of Language Models[1] talk, where they show how GPT2 models could learn non-trivial context-free grammars, as well as effectively do dynamic programming to an extent, so though it would be interesting to see how they performed in the spectral fine-graining task.
[1]: https://physics.allen-zhu.com/home