Quantization has spent too long being marketed like a free lunch with a smaller datatype.
NVIDIA’s latest Model Optimizer walkthrough is useful because it does the opposite. It shows FP8 post-training quantization as a workflow with calibration data, fake quantization, layer exceptions, benchmark evaluation, and an export path to