There are at least three major built-in biases in GPT-O1:
- specific reasoning heuristics hard coded in the RL decision making
- the architectural split between pre-trained LLM and what appears to be a symbolic agent calling it
- the reliance on one-time SGD driven learning (common to all these pre-trained transformers)
IMO search (reasoning) should be an emergent behavior of a predictive architecture capable of continual learning - chained what-if prediction.
There are at least three major built-in biases in GPT-O1:
- specific reasoning heuristics hard coded in the RL decision making
- the architectural split between pre-trained LLM and what appears to be a symbolic agent calling it
- the reliance on one-time SGD driven learning (common to all these pre-trained transformers)
IMO search (reasoning) should be an emergent behavior of a predictive architecture capable of continual learning - chained what-if prediction.