GO
Google Research Blog
4/2/2026

New ways to balance cost and reliability in the Gemini API
TL;DR
Google launched two new Gemini API inference tiers—Flex for cost-optimized workloads and Priority for low-latency applications. Developers can now explicitly choose their cost-versus-reliability tradeoff. This addresses a core tension in API design and gives teams fine-grained control over their performance and budget alignment.
- •Two new inference tiers: Flex (cost-optimized) and Priority (low-latency)
- •Gives developers explicit control over cost-reliability tradeoff
- •Enables workload-specific tier selection
Generated with AI, which can make mistakes.
Is this a good recommendation for you?

