Running Sparsely gated tiny linear experts 🐥 A compute-efficient and interpretable transformer FFN layer
Running Sparsely gated tiny linear experts 🐥 A compute-efficient and interpretable transformer FFN layer
Discovering modular solutions that generalize compositionally Paper • 2312.15001 • Published Dec 22, 2023