The process of reducing the numerical precision of neural network parameters (weights) to decrease memory requirements and increase inference speed. Quantization has a theoretical floor set by information theory: below a certain bit-width, information loss becomes unrecoverable. TurboQuant is reported to have approached this floor.