LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

778 stars 111 forks 778 watchers Python Apache License 2.0
gptq optimum peft quantization sglang transformers vllm
1 Open Issue Need Help Last updated: Sep 13, 2025

Open Issues Need Help

View All on GitHub

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

Python
#gptq#optimum#peft#quantization#sglang#transformers#vllm