gregfrank/Mistral-Large-Instruct-2411-ULRE-abliterated Text Generation • 123B • Updated 9 days ago • 1.28k
gregfrank/Mistral-Large-Instruct-2411-ULRE-abliterated Text Generation • 123B • Updated 9 days ago • 1.28k
Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails Paper • 2603.18280 • Published Mar 18 • 1
How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models Paper • 2604.04385 • Published Apr 13 • 1
How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models Paper • 2604.04385 • Published Apr 13 • 1
Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails Paper • 2603.18280 • Published Mar 18 • 1