Pengyu Cheng

Linear95

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 hour ago

Skill Self-Play: Pushing the Frontier of LLM Capability with Co-Evolving Skills

upvoted a paper about 1 month ago

GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

upvoted a paper about 2 months ago

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

View all activity

Organizations

upvoted a paper about 1 hour ago

Skill Self-Play: Pushing the Frontier of LLM Capability with Co-Evolving Skills

Paper • 2607.22529 • Published 3 days ago • 19

upvoted a paper about 1 month ago

GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

Paper • 2606.16771 • Published Jun 15 • 13

upvoted 4 papers about 2 months ago

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Paper • 2603.24579 • Published Mar 25 • 1

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

Paper • 2603.10101 • Published Mar 10 • 6

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Paper • 2606.03980 • Published Jun 2 • 13

Search Self-play: Pushing the Frontier of Agent Capability without Supervision

Paper • 2510.18821 • Published Oct 21, 2025 • 19

upvoted a paper 4 months ago

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Paper • 2603.25158 • Published Mar 26 • 56

upvoted 2 papers 9 months ago

HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents in Hierarchical Rule Application

Paper • 2510.19631 • Published Oct 22, 2025 • 28

DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking

Paper • 2510.20168 • Published Oct 23, 2025 • 28