ModelCloud/GPTQModel

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

778 stars 111 forks 778 watchers Python Apache License 2.0

gptq optimum peft quantization sglang transformers vllm

View on GitHub Website

1 Open Issue Need Help Last updated: Sep 13, 2025

Open Issues Need Help

View All on GitHub

Binary install not working in airgapped/pip proxy scenario about 2 months ago

help wanted

ModelCloud/GPTQModel

778

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

Python

#gptq#optimum#peft#quantization#sglang#transformers#vllm