2-bit VPTQ: 6.5x Smaller LLMs whereas Preserving 95% Accuracy

Very correct 2-bit quantization for working 70B LLMs on a 24 GB GPU Generated with ChatGPT…