Gemini 3.1 Flash-Lite Preview Hits the Gemini API and Vertex AI

Gemini 3.1 Flash-Lite Preview Hits the Gemini API and Vertex AI

Google has dropped Gemini 3.1 Flash-Lite into preview on both the Gemini API and Vertex AI, targeting the class of production workloads where latency and cost are the primary constraints. The model sits below Flash in the performance hierarchy but is purpose-built for high-frequency inference — the kind of API calls that happen thousands of times per minute in production pipelines where every millisecond and every fraction of a cent adds up. It's available now through AI Studio and Vertex AI with evolving pricing and regional availability as the preview matures.

Alongside Flash-Lite, Google has also begun rolling out Workspace AI beta features to Google AI Ultra and Pro subscribers. Docs, Sheets, Slides, and Drive are all getting Gemini-powered enhancements, pushing AI assistance deeper into the productivity tools that millions of enterprise users already live in every day. The two releases — one squarely developer-focused, one squarely end-user-focused — illustrate how Google is simultaneously building up from the API layer and down from the consumer surface.

For developers building on Vertex AI or the Gemini API, Flash-Lite is worth watching closely. A dedicated low-latency, low-cost tier purpose-built for production throughput addresses one of the most common objections to running Gemini at scale — and early preview access means you can start benchmarking it against your workloads before it reaches general availability.

Read the full article at Apify Blog →