-
KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs
Paper • 2604.13226 • Published • 11 -
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Paper • 2502.16002 • Published -
ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation
Paper • 2602.02579 • Published -
From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
Paper • 2601.12904 • Published
Leo PRO
leideng
AI & ML interests
Efficient AI, Sparse Attention
Recent Activity
updated a bucket about 2 hours ago
leideng/KVPacket published a bucket about 3 hours ago
leideng/KVPacket updated a collection about 3 hours ago
Non-prefix KV ReuseOrganizations
None yet
DiT
Efficient AI
Pretrain
Reasoning
Optimizer
RL
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 66 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 147
Tokenization
SFT
-
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Paper • 2508.05629 • Published • 190 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 27 -
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
Paper • 2408.16673 • Published
Non-prefix KV Reuse
-
KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs
Paper • 2604.13226 • Published • 11 -
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Paper • 2502.16002 • Published -
ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation
Paper • 2602.02579 • Published -
From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
Paper • 2601.12904 • Published
Optimizer
DiT
RL
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 66 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 147
Efficient AI
Tokenization
Pretrain
SFT
-
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Paper • 2508.05629 • Published • 190 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 27 -
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
Paper • 2408.16673 • Published
Reasoning