davidanugraha/Qwen3-4B-Instruct-2507-UserSim-Factored-SimPO-Sample-EqWeightSpan 4B • Updated 3 days ago • 29
davidanugraha/Qwen3-4B-Instruct-2507-UserSim-Factored-SimPO-Sample-InvertedSpan 4B • Updated 3 days ago • 31
davidanugraha/Qwen3-4B-Instruct-2507-UserSim-Factored-SimPO-Sample-NoSpan 4B • Updated 3 days ago • 38
davidanugraha/Qwen3-4B-Instruct-2507-UserSim-SFT-Baseline Text Generation • 4B • Updated 3 days ago • 27
davidanugraha/Qwen3-4B-Instruct-2507-UserSim-SFT-Factored Text Generation • 4B • Updated 3 days ago • 4
davidanugraha/DeepSeek-R1-Distill-Qwen-7B-Overthinking-SFT Text Generation • 8B • Updated Dec 28, 2025 • 3
davidanugraha/DeepSeek-R1-Distill-Qwen-1.5B-Overthinking-SFT Text Generation • 2B • Updated Dec 28, 2025 • 2
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-16k-20test-passrate 3B • Updated Dec 13, 2025 • 3
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-16k-20test-binary 3B • Updated Dec 13, 2025 • 2
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-8k-20test-binary 3B • Updated Dec 13, 2025 • 1
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-4k-20test-passrate 3B • Updated Dec 13, 2025 • 1
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-4k-20test-binary 3B • Updated Dec 13, 2025 • 1