SFT, RL, Preference Training and more of LLMs
-
AdamLucek/Qwen3-4B-Instruct-2507-PII-RL
Text Generation • 4B • Updated • 6 • 2 -
AdamLucek/DeepSeek-V3.1-Truthlessness-1e
Text Generation • Updated -
AdamLucek/Orpo-Llama-3.2-1B-40k
Text Generation • 1B • Updated • 1 • -
AdamLucek/Orpo-Llama-3.2-1B-15k
Text Generation • 1B • Updated • 49 •