Codeseys
/

composer-replication-framework

Reinforcement Learning

Model card Files Files and versions

composer-replication-framework / composer_replication /trainer /tests

59.1 kB

Ctrl+K

Ctrl+K

4 contributors

History: 8 commits

Baladithya Balamurugan

Wave 20: Tier-0 fidelity fixes — k1-in-reward KL + Composer-2 behavior rewards

41289bf 3 days ago

__init__.py

0 Bytes
Wave 21: close both Wave 20 debt items — chat-template alignment + structural is_error 15 days ago
test_chat_template_alignment.py

10.1 kB
Wave 21b: skip zero-signal SDPO on empty-recovery error turns + real-trace validation 15 days ago
test_dr_grpo_config_and_alignment.py

13 kB
Wave 20: Tier-0 fidelity fixes — k1-in-reward KL + Composer-2 behavior rewards 3 days ago
test_killswitch_integration.py

15.2 kB
Wave 3: close the HIGH review findings (kill-switch wiring, HeldoutSplit, EKS entrypoint bug) 3 days ago
test_kl_in_reward.py

5.98 kB
Wave 20: Tier-0 fidelity fixes — k1-in-reward KL + Composer-2 behavior rewards 3 days ago
test_po_objective_menu.py

3.08 kB
feat(trainer): policy-optimization objective MENU (ADR-014) 13 days ago
test_sdpo_alignment_indices.py

11.8 kB
feat(wave-a): close ADR-011 (SDPO alignment indices) + ADR-012 (review findings) 14 days ago