-

@ ma𝕏pool
2025-03-06 10:48:40
Self-Improving Reasoners.
Both expert human problem solvers and successful language models employ four key cognitive behaviors
1. verification (systematic error-checking),
2. backtracking abandoning failing approaches),
3. subgoal setting (decomposing problems into manageable steps), and
4. backward chaining (reasoning from desired outcomes to initial inputs).
Some language models naturally exhibits these reasoning behaviors and exhibit substantial gains, while others don't and quickly plateau.
The presence of reasoning behaviors, not the correctness
of answers is the critical factor. Models with incorrect solutions containing proper reasoning patterns achieve comparable performance to those trained on correct solutions.
It seems that the presence of cognitive behaviors enables self-improvement through RL.
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
https://arxiv.org/abs/2503.01307
#reinforcementlearning #RL
#AI #DL #LLM