arxiv:2606.14711

SWARM-LLM: Collaborative Inference for Edge-based Small Language Models

Published on Apr 22

Authors:

Abstract

SWARM-LLM enables efficient edge deployment of language models by dynamically routing queries between local SLMs and a cloud FM based on uncertainty estimates and safety signals.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Large language models (LLMs) provide strong performance across a wide range of tasks but are typically hosted on centralised cloud infrastructure, incurring significant bandwidth, latency, and privacy costs. In contrast, small language models (SLMs) can run on edge devices but have limited capability and robustness. This paper introduces SWARM-LLM, a routing and collaboration layer that coordinates a small swarm of edge-hosted SLMs with an optional cloud foundation model (FM). SWARM-LLM decides, for each query, whether to answer locally, collaborate with peer SLMs, or "summon" a cloud FM, using lightweight uncertainty estimates and safety signals. We implement a working prototype on commodity hardware with three heterogeneous SLMs and a 70B-parameter cloud FM accessed via API, and evaluate it on a controlled study workload of easy, hard, and safety-oriented queries. Our results show that SWARM-LLM substantially improves performance on hard questions compared to an edge-only deployment, while limiting cloud usage to roughly one quarter of queries, illustrating a practical trade-off between accuracy, latency, and cost for privacy-conscious edge deployments. The implementation code is available at the GitHub repository https://github.com/mdahshan/swarm_llm.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.14711

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.14711 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.14711 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.