Back to feed
GO
Google Research Blog
4/2/2026
New ways to balance cost and reliability in the Gemini API

New ways to balance cost and reliability in the Gemini API

TL;DR

Google launched two new Gemini API inference tiers—Flex for cost-optimized workloads and Priority for low-latency applications. Developers can now explicitly choose their cost-versus-reliability tradeoff. This addresses a core tension in API design and gives teams fine-grained control over their performance and budget alignment.

  • Two new inference tiers: Flex (cost-optimized) and Priority (low-latency)
  • Gives developers explicit control over cost-reliability tradeoff
  • Enables workload-specific tier selection

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more