arxiv:2601.05167
Langlin Huang
shrango
AI & ML interests
LLM Reasoning, Machine Translation
Recent Activity
upvoted a paper about 23 hours ago
Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling upvoted a paper 14 days ago
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories upvoted a paper 15 days ago
Process Rewards with Learned Reliability