Google has released Gemini 3.1 Flash-Lite, a new efficiency-focused model priced at just $0.25 per million input tokens and $1.00 per million output tokens. Google claims the model delivers 2.5 times faster response times and 45 per cent faster output generation compared to the previous Gemini 2.5 Flash generation, while matching or exceeding quality benchmarks on standard evaluation suites.
The Pricing Shock
At $0.25 per million input tokens, Flash-Lite is the cheapest frontier-capable model from any of the top three AI labs. OpenAI’s cheapest comparable tier — gpt-5o-mini — sits at $0.40 per million input tokens. Anthropic’s Haiku 4.5 is at $0.50. Google’s move is being read as an aggressive play to lock in the high-volume, latency-sensitive enterprise workloads where pennies-per-request matter more than benchmark scores.
Benchmark Parity, Not Dominance
Independent testers are reporting that Flash-Lite matches its predecessor on MMLU, GSM8K and HumanEval while showing a small regression on long-context reasoning benchmarks like BIG-Bench Hard. The tradeoff Google made is clear: shave the parameters, speed up inference, accept minor quality dips on hard reasoning tasks. For the vast majority of enterprise use cases — classification, summarisation, content moderation, customer support routing — the math is compelling.
The Competitive Shape of 2026
With Flash-Lite, the price of high-quality AI inference has dropped roughly 40 per cent in 12 months across the industry. That shift is changing which kinds of AI products make economic sense. Agentic workflows that require 50+ model calls per task are now profitable where they weren’t a year ago. For the big three labs, the game has moved from “who has the best model” to “who wins the margin war on billion-token workloads.”














Leave a Reply