Google Ships Gemini 3.1 Flash-Lite — 2.5× Faster Response, $0.25 Per Million Input Tokens

Google has launched Gemini 3.1 Flash-Lite, a new efficiency-focused model that lands as the cheapest frontier-class model on the market. Pricing: $0.25 per million input tokens. Speed: 2.5× faster response time and 45% faster output generation than its predecessor.

The Headline Specs

  • Response speed: 2.5× faster
  • Output speed: 45% faster generation
  • Input price: $0.25 / million tokens — undercutting the prior tier
  • Context window and tool-use parity with the Gemini 3.0 family
  • Available immediately via Vertex AI and the Gemini API

Why It Matters

The most expensive line on most enterprise AI bills is high-volume, low-complexity inference — RAG retrieval, classification, summarisation, agent-loop intermediate calls. Flash-Lite is engineered to take that workload at a price that makes it economic to default to it. The sub-$0.25 price floor pressures OpenAI, Anthropic and DeepSeek to respond on their respective Lite tiers.

The Wider Race

Frontier-class models have bifurcated: ultra-capable flagships (Gemini 3 Ultra, GPT-5, Claude 4) versus high-throughput, low-cost variants (Flash-Lite, Haiku, GPT-mini). The economics of agentic systems — which can fire dozens of model calls per user task — are increasingly determined by the cheap-tier price floor.

What It Signals

Google is doubling down on the “affordable frontier” wedge — the model class that captures workloads where smarts-per-dollar matters more than absolute capability. With Gemini Flash-Lite at $0.25 input, the next price-cycle floor is in sight.

Follow Vibes Uncut Media for continuing AI infrastructure coverage.

Leave a Reply

Your email address will not be published. Required fields are marked *