Back to feed
Dev.to
Dev.to
6/5/2026
BAGEN: LLM Agents Waste 44% of Tokens on Tasks They'll Fail

BAGEN: LLM Agents Waste 44% of Tokens on Tasks They'll Fail

Short summary

New research (BAGEN) from Northwestern, Stanford, and Cornell shows frontier LLM agents waste 28–64% of tokens by continuing trajectories they'll fail, unable to predict their own budget needs. While models score well on task completion, they're poor at estimating resource usage (r=0.35)—a training signal mismatch where agents are optimized for success, not metacognitive awareness. Early-stop mechanisms can recover most of that cost, though interval estimation remains genuinely hard even after fine-tuning.

  • Frontier agents can't predict when they'll fail or accurately estimate token requirements
  • Early stopping could save 28-64% of inference costs on failed tasks
  • Training signal mismatch: models optimize for task completion, not resource awareness

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more