Wenboz/SACD-Qwen2.5-3B-ALFWorld-k1-tau0.5-beta1.0-plain-pipeline Reinforcement Learning • 3B • Updated 1 day ago
Wenboz/SACD-Qwen2.5-3B-ALFWorld-k1-tau0.75-beta1.0-plain-pipeline Reinforcement Learning • 3B • Updated 1 day ago • 1
Wenboz/ultrafeedback_rationale_Qwen2.5-3B-Instruct_ultra_filter_2e-5_thre-0.8_packing_42_cot Updated Mar 3, 2025 • 3
Wenboz/ultrafeedback_rationale_Qwen2.5-3B-Instruct_ultra_sft_2e-5_thre-0.7_packing_42_cot Viewer • Updated Mar 1, 2025 • 63.1k • 10