Inference tuning has officially entered the “you need a simulator before you touch production” phase. That is not because NVIDIA invented simulation this week. It is because modern LLM serving has accumulated enough interacting controls — tensor parallelism, prefill/decode split, routing, KV cache placement, autoscaling, cold starts, worker counts, backend