There is a running observation that long running self-correcting LLM loops based on feedback from external systems yield diminishing returns. Recently, Stripe’s blog about their coding agent Minions noted that if a coding task runs for too long, trying to correct each and every mistake that the CI pipeline discovers, you don’t really get that many gains. So there is a trade-off to be made on how many times you want to run the self-correcting loop.
Why would that be? Well, it depends on the CI pipeline as well. For example, if the CI pipeline has an agentic code reviewer, I’ve found that the AI reviewer always has some comments to be made about the code, even if it is nitpick. So to resolve each and every one of them would be a waste of tokens.