Dev.to
6/5/2026

BAGEN: LLM Agents Waste 44% of Tokens on Tasks They'll Fail
Short summary
New research (BAGEN) from Northwestern, Stanford, and Cornell shows frontier LLM agents waste 28–64% of tokens by continuing trajectories they'll fail, unable to predict their own budget needs. While models score well on task completion, they're poor at estimating resource usage (r=0.35)—a training signal mismatch where agents are optimized for success, not metacognitive awareness. Early-stop mechanisms can recover most of that cost, though interval estimation remains genuinely hard even after fine-tuning.
- •Frontier agents can't predict when they'll fail or accurately estimate token requirements
- •Early stopping could save 28-64% of inference costs on failed tasks
- •Training signal mismatch: models optimize for task completion, not resource awareness
Generated with AI, which can make mistakes.
Is this a good recommendation for you?



