kvcached: Elastic KV cache for dynamic GPU sharing and efficient multi-LLM inference.

87 stars 15 forks 87 watchers Python Apache License 2.0
gpu-mutiplexing gpu-sharing inference-engine kvcache kvcached llm sglang vllm
1 Open Issue Need Help Last updated: Sep 14, 2025

Open Issues Need Help

View All on GitHub

kvcached: Elastic KV cache for dynamic GPU sharing and efficient multi-LLM inference.

Python
#gpu-mutiplexing#gpu-sharing#inference-engine#kvcache#kvcached#llm#sglang#vllm