Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast…
HQQ - Half-Quadratic Quantization Fast, calibration-free weight quantization supporting 8/4/3/2/1-bit precision with multiple optimized backends. When to use HQQ Use HQQ when: Quantizing models without calibration data (no dataset needed) Need fast quantization (minutes vs hours for GPTQ/AWQ) Deploying with vLLM or HuggingFace Transformers Fine-tuning quantized models with LoRA/PEFT Experimenting with extreme quantization (2-bit, 1-bit) Key advantages: No calibration: Quantize any model instantly without sample data Multiple backends: PyTorch, ATEN, TorchAO, Marlin, BitBlas for optimized inference Flexible precision: 8/4/3/2/1-bit with configurable group sizes Framework integration: Native HuggingFace and vLLM support PEFT compatible: Fine-tune quantized models with LoRA
don't have the plugin yet? install it then click "run inline in claude" again.