Google has launched Gemini 3.1 Flash-Lite, a new efficiency-focused model that lands as the cheapest frontier-class model on the market. Pricing: $0.25 per million input tokens. Speed: 2.5× faster response time and 45% faster output generation than its predecessor.
The Headline Specs
- Response speed: 2.5× faster
- Output speed: 45% faster generation
- Input price: $0.25 / million tokens — undercutting the prior tier
- Context window and tool-use parity with the Gemini 3.0 family
- Available immediately via Vertex AI and the Gemini API
Why It Matters
The most expensive line on most enterprise AI bills is high-volume, low-complexity inference — RAG retrieval, classification, summarisation, agent-loop intermediate calls. Flash-Lite is engineered to take that workload at a price that makes it economic to default to it. The sub-$0.25 price floor pressures OpenAI, Anthropic and DeepSeek to respond on their respective Lite tiers.
The Wider Race
Frontier-class models have bifurcated: ultra-capable flagships (Gemini 3 Ultra, GPT-5, Claude 4) versus high-throughput, low-cost variants (Flash-Lite, Haiku, GPT-mini). The economics of agentic systems — which can fire dozens of model calls per user task — are increasingly determined by the cheap-tier price floor.
What It Signals
Google is doubling down on the “affordable frontier” wedge — the model class that captures workloads where smarts-per-dollar matters more than absolute capability. With Gemini Flash-Lite at $0.25 input, the next price-cycle floor is in sight.
Follow Vibes Uncut Media for continuing AI infrastructure coverage.














Leave a Reply