Close Menu
RoboNewsWire – Latest Insights on AI, Robotics, Crypto and Tech Innovations
  • Home
  • AI
  • Crypto
  • Cybersecurity
  • IT
  • Energy
  • Robotics
  • TechCrunch
  • Technology
What's Hot

TechCrunch Mobility: Tesla denied ‘Robotaxi’ trademark, Aurora loses a co-founder, and tariffs start to take a toll

May 10, 2025

The Department of Labor just dropped its investigation into Scale AI

May 10, 2025

Google I/O 2025: What to expect, including updates to Gemini and Android 16

May 10, 2025
Facebook X (Twitter) Instagram
Trending
  • TechCrunch Mobility: Tesla denied ‘Robotaxi’ trademark, Aurora loses a co-founder, and tariffs start to take a toll
  • The Department of Labor just dropped its investigation into Scale AI
  • Google I/O 2025: What to expect, including updates to Gemini and Android 16
  • Google will pay Texas $1.4B to settle claims the company collected users’ data without permission
  • Reddit user nailed $100K Bitcoin call in 2014 — but missed the payday
  • Fed holds steady, Bitcoin climbs and Ethereum rolls out major upgrade
  • Bill Gates tells his foundation to spend it all by 2045
  • OpenAI launches a data residency program in Asia
  • Home
  • About Us
  • Advertise
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram
RoboNewsWire – Latest Insights on AI, Robotics, Crypto and Tech InnovationsRoboNewsWire – Latest Insights on AI, Robotics, Crypto and Tech Innovations
Saturday, May 10
  • Home
  • AI
  • Crypto
  • Cybersecurity
  • IT
  • Energy
  • Robotics
  • TechCrunch
  • Technology
RoboNewsWire – Latest Insights on AI, Robotics, Crypto and Tech Innovations
Home » Google launches ‘implicit caching’ to make accessing its latest AI models cheaper

Google launches ‘implicit caching’ to make accessing its latest AI models cheaper

GTBy GTMay 9, 2025 TechCrunch No Comments3 Mins Read
Share
Facebook Twitter LinkedIn Pinterest Email


Google is rolling out a feature in its Gemini API that the company claims will make its latest AI models cheaper for third-party developers.

Google calls the feature “implicit caching” and says it can deliver 75% savings on “repetitive context” passed to models via the Gemini API. It supports Google’s Gemini 2.5 Pro and 2.5 Flash models.

That’s likely to be welcome news to developers as the cost of using frontier models continues to grow.

We just shipped implicit caching in the Gemini API, automatically enabling a 75% cost savings with the Gemini 2.5 models when your request hits a cache 🚢

We also lowered the min token required to hit caches to 1K on 2.5 Flash and 2K on 2.5 Pro!

— Logan Kilpatrick (@OfficialLoganK) May 8, 2025

Caching, a widely adopted practice in the AI industry, reuses frequently accessed or pre-computed data from models to cut down on computing requirements and cost. For example, caches can store answers to questions users often ask of a model, eliminating the need for the model to re-create answers to the same request.

Google previously offered model prompt caching, but only explicit prompt caching, meaning devs had to define their highest-frequency prompts. While cost savings were supposed to be guaranteed, explicit prompt caching typically involved a lot of manual work.

Some developers weren’t pleased with how Google’s explicit caching implementation worked for Gemini 2.5 Pro, which they said could cause surprisingly large API bills. Complaints reached a fever pitch in the past week, prompting the Gemini team to apologize and pledge to make changes.

In contrast to explicit caching, implicit caching is automatic. Enabled by default for Gemini 2.5 models, it passes on cost savings if a Gemini API request to a model hits a cache.

Techcrunch event

Berkeley, CA
|
June 5

BOOK NOW

“[W]hen you send a request to one of the Gemini 2.5 models, if the request shares a common prefix as one of previous requests, then it’s eligible for a cache hit,” explained Google in a blog post. “We will dynamically pass cost savings back to you.”

The minimum prompt token count for implicit caching is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro, according to Google’s developer documentation, which is not a terribly big amount, meaning it shouldn’t take much to trigger these automatic savings. Tokens are the raw bits of data models work with, with a thousand tokens equivalent to about 750 words.

Given that Google’s last claims of cost savings from caching ran afoul, there are some buyer-beware areas in this new feature. For one, Google recommends that developers keep repetitive context at the beginning of requests to increase the chances of implicit cache hits. Context that might change from request to request should be appended at the end, the company says.

For another, Google didn’t offer any third-party verification that the new implicit caching system would deliver the promised automatic savings. So we’ll have to see what early adopters say.



Source link

GT
  • Website

Keep Reading

TechCrunch Mobility: Tesla denied ‘Robotaxi’ trademark, Aurora loses a co-founder, and tariffs start to take a toll

The Department of Labor just dropped its investigation into Scale AI

Google I/O 2025: What to expect, including updates to Gemini and Android 16

Bill Gates tells his foundation to spend it all by 2045

OpenAI launches a data residency program in Asia

Sequoia leads $1.5B tender offer for sales automation startup Clay

Add A Comment
Leave A Reply Cancel Reply

Editors Picks

Virtual chronic care company Omada Health files for IPO

May 9, 2025

Google search remedies trial wraps

May 9, 2025

Fortnite applies to Apple’s App Store after Epic Games court win

May 9, 2025

Tech’s strong ad sales are starting to crack from Trump’s trade war

May 9, 2025
Latest Posts

Hackers Launching Cyber Attacks Targeting Multiple Schools & Universities in New Mexico

May 6, 2025

Over 90% of Cybersecurity Leaders Worldwide Encountered Cyberattacks Targeting Cloud Environments

May 1, 2025

China Reportedly Admits Their Role in Cyber Attacks Against U.S. Infrastructure

April 14, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to RoboNewsWire, your trusted source for cutting-edge news and insights in the world of technology. We are dedicated to providing timely and accurate information on the most important trends shaping the future across multiple sectors. Our mission is to keep you informed and ahead of the curve with deep dives, expert analysis, and the latest updates in key industries that are transforming the world.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 Robonewswire. Designed by robonewswire.

Type above and press Enter to search. Press Esc to cancel.

STEAM Education

At FutureBots, we believe the future belongs to creators, thinkers, and problem-solvers. That’s why we’ve made it our mission to provide high-quality STEM products designed to inspire curiosity, spark innovation, and empower learners of all ages to shape the world through robotics and technology.