Open Issues Need Help
View All on GitHub Question About kvcached Ability to Dynamically Recognize and Utilize Kubernetes Elastic Scaled GPU Memory Resources about 1 month ago
good first issue
kvcached: Elastic KV cache for dynamic GPU sharing and efficient multi-LLM inference.
Python
#gpu-mutiplexing#gpu-sharing#inference-engine#kvcache#kvcached#llm#sglang#vllm