Inversion Lab for AI Safety

https://ainversion.github.io

AI & ML interests

Interpretability of Language Models and Multi-Agent Safety

Recent Activity

lgalke authored a paper about 1 month ago

The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment

lgalke authored a paper about 1 month ago

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

lgalke authored a paper about 2 months ago

DeToNATION: Decoupled Torch Network-Aware Training on Interlinked Online Nodes

View all activity

ainversion 's datasets

None public yet