Pengyu Cheng

Linear95

3 8 9

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 month ago

GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

upvoted a paper about 2 months ago

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

upvoted a paper about 2 months ago

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

View all activity

Organizations

upvoted a paper about 1 month ago

GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

Paper • 2606.16771 • Published Jun 15 • 13

upvoted 2 papers about 2 months ago

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Paper • 2603.24579 • Published Mar 25 • 1

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

Paper • 2603.10101 • Published Mar 10 • 6

authored 7 papers about 2 months ago

upvoted 2 papers about 2 months ago

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Paper • 2606.03980 • Published Jun 2 • 13

Search Self-play: Pushing the Frontier of Agent Capability without Supervision

Paper • 2510.18821 • Published Oct 21, 2025 • 19

upvoted a paper 4 months ago

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Paper • 2603.25158 • Published Mar 26 • 56

liked a model 6 months ago

microsoft/VibeVoice-ASR

Automatic Speech Recognition • 9B • Updated Jan 27 • 686k • 1.24k

liked a dataset 7 months ago

Quark-LLM/SSP

Preview • Updated Dec 31, 2025 • 573 • 1

New activity in Quark-LLM/SSP 7 months ago

Add task category and improve dataset card

#3 opened 7 months ago by

nielsr

docs: update readme

#2 opened 7 months ago by

Necolizer

New activity in Quark-LLM/SSP 9 months ago

feat: upload training and evaluation data

#1 opened 9 months ago by

Necolizer

published a dataset 9 months ago

Quark-LLM/SSP

Preview • Updated Dec 31, 2025 • 573 • 1

upvoted a paper 9 months ago

HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents in Hierarchical Rule Application

Paper • 2510.19631 • Published Oct 22, 2025 • 28