·
AI & ML interests
None yet
Recent Activity
posted an update 2 days ago Built OpenRouter's Fusion on our own LiteLLM gateway, then benchmarked whether it earned its cost.
The detail that decides the design: in OpenRouter's own numbers, fusing a model with itself still gained ~6.7 points. So the engine is the judge synthesizing over diverse samples, not the mix of models. Self-MoA ("Rethinking Mixture-of-Agents", arXiv 2502.00674) backs it — aggregating samples from one strong model beats mixing in weaker ones, which usually dilutes quality.
That maps cleanly onto local inference. A multi-model panel means holding N models resident, a non-starter on one shared card. Judged self-consistency needs only one, and ours already runs as two load-balanced replicas, so the samples spread across both GPUs for free.
~360-line CustomLLM provider, every sub-call looped back through the gateway so it keeps routing, fallbacks, and cost tracking, and a 29-prompt blind-ranked benchmark with an explicit ship rule. All MIT.
Breakdown: https://protolabs.studio/blog/fusion-on-your-own-litellm-gateway
Code: https://github.com/protoLabsAI/fusion-gateway View all activity Organizations