claude-code

Gemini-MCP Is the BYO Context Window Hack Claude Code Users Were Going to Build Anyway

Anatoliy Kolodkin

31 May 2026 • 5 min read

The most predictable Claude Code workaround is now a GitHub repo: when the editing harness is good but the context window feels expensive, bolt on another model as a bulk-analysis coprocessor.

That is the pitch behind gemini-mcp, a local MCP server created on May 31 that lets Claude Code offload large codebase reads, log compression, semantic grep, diff summaries, code review, planning, and test generation to Gemini. The repo had zero stars, zero forks, and no issues at research time, so treat it as an early implementation, not a standard. But the pattern is obvious enough that if this project did not exist, someone else would have built it by Friday.

The README frames the tool as context preservation. Claude Code keeps the editing loop, terminal workflow, and reasoning surface. Gemini gets used for bulk ingestion through its 1,048,576-token context window. The MCP server returns compact structured results — often described as 200 to 500 tokens — back to Claude, keeping the expensive working context available for decisions and edits instead of raw scanning.

That is exactly the kind of hack developers build when vendor product boundaries do not match real workflows. Claude may be the preferred coding agent. Gemini may be the better haystack reader. MCP is the wire. The engineering question is whether the wire preserves enough truth to be useful without smuggling in new risk.

Long context is a coprocessor, not a source of truth

The repo’s numbers are blunt. It claims Claude Code’s usable context is roughly 120,000 to 160,000 tokens after system overhead, tool schemas, and conversation history. A large codebase read can burn 20,000 to 80,000 tokens. For a 500-file codebase, the README says a naive read path might cost more than 50,000 Claude-context tokens, while the Gemini path can return a 300 to 600 token digest — a claimed 90% to 98% reduction for codebase reads.

The appeal is real. Anyone who has watched a coding agent burn half a session reading logs or walking a monorepo understands why this exists. gemini_read_codebase, gemini_grep_semantic, gemini_shrink_logs, gemini_summarize_diff, gemini_review_code, gemini_generate_plan, gemini_validate_approach, gemini_write_tests, gemini_write_boilerplate, gemini_explain_error, and gemini_context_cost are the tool list you would expect from someone trying to stop Claude from using its premium attention on bulk search.

The danger is also real: summaries are lossy. A 500-token answer can be the right amount of context for planning, but it is not a substitute for reading the exact lines before editing. The README’s structural verification is a good mitigation. It greps locally for identifiers named by Gemini and annotates missing ones as [UNVERIFIED], while web-sourced identifiers get [WEB_SOURCE]. That catches a useful class of hallucination. It does not prove Gemini understood the invariant, call graph, permission boundary, migration path, or business rule.

For serious work, the right workflow is narrow-then-verify. Use Gemini to find the likely files, compress logs, summarize a diff, or sketch the search space. Then make Claude inspect the specific source slices before touching code. Offloading should reduce context waste, not replace source-of-truth reads. If the agent cannot point to the file and line that justify a change, the summary has become a rumor with JSON formatting.

MCP makes model routing easy; governance has to catch up

The implementation is intentionally local. The server uses stdio JSON-RPC and is registered with a command like claude mcp add gemini-mcp -s user -- env GEMINI_API_KEY=... node .../dist/index.js. The package is TypeScript, depends on @google/generative-ai, @modelcontextprotocol/sdk, dotenv, ignore, and zod, and defaults to settings including GEMINI_MODEL=gemini-3.1-pro-preview, GEMINI_MAX_OUTPUT_TOKENS=65536, CACHE_MAX_AGE_MS=86400000, and SCHEMA_COST_WARN_THRESHOLD=5000.

The security posture is more explicit than many small MCP projects, which is good. The README warns that code and logs sent through the tools go to the Gemini API, so users should not use it with secrets, credentials, proprietary algorithms, or confidentiality-restricted data. It also describes prompt-injection defenses that strip common instruction patterns and wrap returned content as reference data before sending it back to Claude Code.

That warning should not be treated as boilerplate. Adding this server expands the trust boundary from “Claude Code plus local repo” to “Claude Code, a local MCP process, Google’s API, local caches, prompt-sanitization code, and any web-search path used for fresh library docs.” That may be perfectly acceptable for an open-source project or a disposable experiment. It may be unacceptable for a regulated customer repo. The distinction has to be made before the tool is registered, not after someone realizes a stack trace contained credentials.

The caching model is practical but deserves scrutiny. mtime-based invalidation is better than a naive TTL for local code, because edited files invalidate related summaries. But cache governance matters when the payload includes source code or logs. Developers should know where cached data lives, how to clear it, whether it is encrypted at rest, what retention means in practice, and whether test logs with secrets are excluded. Local caches are not free just because they avoid SaaS storage.

Tool schemas are part of the token budget

The sleeper feature is gemini_context_cost. MCP tools are advertised to Claude through schemas, and those schemas consume context before the agent has done any useful work. Teams often add MCP servers as if every tool were free until called. It is not. Tool surfaces become a standing tax on the session.

Making schema overhead visible is the kind of unglamorous improvement that separates a clever setup from an operable one. If a team registers ten MCP servers and hundreds of tools, the agent may spend meaningful context just learning what buttons exist. That affects cost, latency, and attention. The right question is not only “what can this server do?” It is also “what does it cost every turn to make this server available?”

That point generalizes beyond this repo. Coding-agent economics are pushing developers toward model routing, BYOK setups, local proxies, long-context copilots, and provider-specific helpers. Those patterns are useful. They also create a governance problem: which providers may see which repos, which tools are enabled by default, how failures are translated, where logs go, and whether summaries can drive edits without verification.

For practitioners, the adoption checklist is straightforward. Use a Gemini offload server only on code you are allowed to send to Gemini. Redact logs. Start with read-only tasks such as architecture summaries, semantic grep, and error explanation. Require Claude to verify concrete source slices before edits. Monitor MCP schema overhead. Constrain or disable web search if your environment cannot tolerate external lookups. Do not turn a single-developer local tool into a shared team service casually; the README itself describes this as local developer infrastructure, not a multi-tenant control plane.

The editorial take: gemini-mcp is a useful glimpse of the BYO-context-window future. Claude Code remains the editing harness, Gemini becomes the bulk-analysis coprocessor, and MCP carries the results. That is smart when it narrows the codebase before verification. It is dangerous when compact summaries are mistaken for ground truth. The difference is not model quality. It is workflow discipline.

Sources: gemini-mcp README, Claude Code MCP docs, Model Context Protocol docs, Google AI Studio API keys

Long context is a coprocessor, not a source of truth

MCP makes model routing easy; governance has to catch up

Tool schemas are part of the token budget

Sign up for more like this.