Your Coding Agent's Logs Are Useless for Debugging — This Iterative Framework Fixes That

Your Coding Agent's Logs Are Useless for Debugging — This Iterative Framework Fixes That

Logging is supposed to be the signal that tells you what went wrong. For LLM-based coding agents, it's largely broken — and a new paper from Xin Wang and colleagues explains precisely why. Current automatic logging systems are trained to produce logs that look like developer-written logs, optimized for similarity to human gold standards. The problem is that developer logs are written for human readers, not for agents doing downstream debugging. The result is logs that are stylistically correct but functionally useless for the agent loop.

The fix, introduced as ReLog, is to stop treating logging as an imitation task and start treating it as an optimization problem. ReLog runs a generate → execute → evaluate → refine loop: an LLM proposes log statements, they're executed, downstream debugging performance is measured as a reward signal, and the logs are iteratively improved. Evaluated on the Defects4J benchmark in both source-available and source-unavailable settings, ReLog reaches F1 0.520 and repairs 97 defects — outperforming all single-pass baselines, including LLM-generated logs without iterative refinement. Ablations confirm both the iterative loop and the compilation repair sub-step contribute independently to performance.

The conceptual shift here is worth sitting with. Logs are not a byproduct of code — they are a first-class artifact that should be optimized for whatever the agent needs to do next. As coding agents take on longer-horizon work with less human supervision, the quality of their self-generated debugging signals becomes a critical bottleneck. The ReLog loop pattern is directly extractable into any agent scaffold that uses logs as a debugging or decision signal, and the results suggest the investment pays off cleanly.

Read the full paper on arXiv →