papers Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation Paper • 2606.18844 • Published 17 days ago • 18 APPO: Agentic Procedural Policy Optimization Paper • 2606.12384 • Published 24 days ago • 79
Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation Paper • 2606.18844 • Published 17 days ago • 18
papers Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation Paper • 2606.18844 • Published 17 days ago • 18 APPO: Agentic Procedural Policy Optimization Paper • 2606.12384 • Published 24 days ago • 79
Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation Paper • 2606.18844 • Published 17 days ago • 18