MCP Token Counter: Why Your Tools Are Silently Eating Your Context Window
Nikhil Tiwari
MCP Playground
📖 TL;DR — Key Takeaways
- Every tool definition your MCP server exposes is included in the AI's system prompt on every single request
- A server with 50 tools easily consumes 10,000–20,000 tokens before your actual prompt begins
- This silently inflates your API costs and can push your real content out of the context window
- The fix: measure first, then trim descriptions, split servers, or filter tools per agent
- MCP Token Counter — paste a server URL and get an instant per-tool breakdown, free
You connected an MCP server. Your agent is working. Everything looks fine. But somewhere in the background, your AI budget is draining faster than it should — and the culprit isn't your conversation history or your documents. It's the tools themselves.
This is one of the most overlooked costs in MCP-based AI systems, and it compounds silently as you add more tools to your server.
How MCP Tool Definitions Consume Tokens
When an AI model supports tool use, it needs to understand what tools are available before it can decide which one to call. This understanding comes from a schema — a structured description of each tool's name, purpose, and parameters.
In the MCP protocol, this schema is delivered automatically. When your client calls tools/list at session start, the server returns a JSON array of every tool it exposes. The MCP client then forwards these definitions to the language model as part of the request payload — on every single message in the conversation.
Here is what a single tool definition looks like when serialized:
{
"name": "search_repositories",
"description": "Search GitHub repositories by keyword, language, or topic.
Returns repository name, description, star count, primary language,
last updated date, and clone URL. Supports pagination with page and
per_page parameters. Use sort to order by stars, forks, or
recently updated.",
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query with GitHub search syntax support"
},
"language": {
"type": "string",
"description": "Filter by programming language"
},
"sort": {
"type": "string",
"enum": ["stars", "forks", "updated"],
"description": "Sort order for results"
},
"page": { "type": "number", "description": "Page number (default: 1)" },
"per_page": { "type": "number", "description": "Results per page, max 100" }
},
"required": ["query"]
}
}
That single tool definition is approximately 280–320 tokens. Now multiply by 40 tools in a typical GitHub MCP server and you have around 12,000 tokens of overhead — before a single word of your actual prompt is processed.
Why This Matters
💸 Direct Cost
Input tokens are billed. 10,000 extra tokens per request × 1,000 daily requests = 10M extra input tokens per day. At Sonnet 4.6 rates ($3/1M), that's $30/day in pure overhead.
📉 Reduced Useful Context
Most models have a 128k–200k context window. If 15,000 tokens are consumed by tool definitions, that's 15,000 fewer tokens available for your documents, conversation history, and instructions.
🐢 Latency
Larger prompts take longer to process, even with fast models. In long multi-turn conversations where tool overhead accumulates, this adds measurable latency to every response.
🤯 Model Confusion
Some research suggests that presenting a model with 50+ tools degrades tool selection quality. Too many choices leads to worse decisions — especially when many tools are similar.
The Context Window Math
Let's break down where a typical 128,000-token context window actually goes in an MCP-powered agent:
The key insight: tool definitions are a fixed cost per request. Unlike conversation history which accumulates, tool overhead doesn't grow — but it also never goes away. Optimizing it is a one-time investment with permanent returns.
Real scenario: a developer's wake-up call
A developer built an internal agent connecting 3 MCP servers: Supabase (22 tools), GitHub (41 tools), and Linear (18 tools). That's 81 tools. At roughly 250 tokens per tool, 20,000+ tokens were consumed before any message was processed — 16% of a 128k window gone, on every single API call.
Measuring Your MCP Server's Token Footprint
Before you can optimize, you need to know where you stand. The MCP Token Counter at MCP Playground does this in seconds — no sign-up, no installation.
How to use it:
Paste your MCP server URL
Supports both Streamable HTTP (/mcp) and legacy SSE (/sse) transports
Add an auth token if needed
For private servers requiring Bearer authentication — the token stays in your browser and is only used for this single request
Click Analyze Tokens
The tool connects to your server, fetches the tool list, estimates token usage per tool, and ranks them from largest to smallest
Read your results
You get a total token estimate, tool count, average tokens per tool, a visual budget bar against a 16k baseline, and a per-tool ranked breakdown
The color coding gives you instant signal:
- Green (under 4,000 tokens) — healthy, your tool footprint is well-managed
- Amber (4,000–10,000 tokens) — worth reviewing; trimming could meaningfully reduce spend
- Red (10,000+ tokens) — high overhead; optimization is strongly recommended
5 Strategies to Reduce MCP Token Overhead
Strategy 1 — Trim Verbose Tool Descriptions
Tool descriptions are the biggest variable in token cost. Developers often write them as full paragraphs of documentation, but models only need enough to make a routing decision. Compare these two descriptions for the same tool:
❌ Verbose (~80 tokens)
"description": "Searches the repository
for files matching a given glob pattern.
This tool is useful when you need to find
files by their name or extension across
the entire repository tree. It supports
standard glob wildcards including * and **
for recursive matching. Returns a list of
matching file paths relative to the
repository root."
✅ Lean (~20 tokens)
"description": "Find files by glob
pattern. Returns matching paths."
That's a 75% reduction for a single description. Across 40 tools, trimming descriptions from 80 tokens to 20 tokens saves approximately 2,400 tokens per request.
Rule of thumb
A description should answer: "When would the model choose this tool over other tools?" Everything beyond that is overhead. Keep descriptions under 15 words wherever the function name is already self-explanatory.
Strategy 2 — Trim Parameter Descriptions
Parameter descriptions are often as bloated as tool descriptions — and they multiply by the number of parameters per tool. If a parameter is named user_id with type string, the description "The unique identifier of the user whose data you want to retrieve" adds tokens without adding clarity.
For parameters with obvious names, the description can be as short as the type and a one-clause constraint:
// Before: ~25 tokens
"description": "The unique identifier of the repository in owner/name format"
// After: ~10 tokens
"description": "Repository in owner/name format"
Strategy 3 — Split into Focused Servers
Instead of one server with 60 tools covering every operation your platform supports, build smaller purpose-specific servers:
github-read.mcp
list_repos, get_file, list_issues, list_prs — read-only, 8 tools
github-write.mcp
create_pr, push_commit, add_comment — write ops, 6 tools
github-search.mcp
search_code, search_issues, search_users — discovery ops, 5 tools
An agent that only needs to read from GitHub connects to the read server alone — and pays for 8 tools instead of 40. For agents that need everything, you can connect multiple servers but you now have visibility into the exact cost of each capability layer.
Strategy 4 — Filter Tools per Agent at Runtime
Some MCP server implementations support tool filtering — you can pass a list of tool names to allow or deny at session initialization. If your framework supports it, pass only the tools relevant to the current task:
// Claude Desktop config — restrict to specific tools
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "..." },
"allowedTools": ["list_issues", "create_issue", "add_comment"]
}
}
}
This is particularly effective for specialized agents: a "triage bot" that only creates and comments on issues doesn't need the 35 other GitHub tools in its context.
Strategy 5 — Remove or Stub Rarely Used Tools
The token counter's per-tool ranking makes this obvious: sort by token cost, then look at the top 20%. These are usually tools with the longest descriptions and most complex schemas. For each one, ask: how often does the AI actually call this?
If a tool is called less than 1% of the time but costs 500+ tokens per request, it's a candidate for removal or stubbing (replacing with a minimal no-op that can be re-enabled for specific sessions).
Tradeoffs to Keep in Mind
Optimization has limits
- Too-short descriptions hurt routing. If the model can't distinguish two similar tools from their names alone, it needs the description. Under-describing tools leads to wrong tool selection, which is worse than the token cost.
- Splitting servers adds connection overhead. Each MCP connection has a setup cost. Beyond 3–4 concurrent servers in a session, connection management complexity starts to outweigh token savings.
- Prompt caching offsets the cost. Anthropic's prompt cache and similar features cache the tool definitions portion of the prompt. At high request volume, the marginal cost of tool tokens approaches zero for repeated sessions with the same tools.
The right optimization strategy depends on your traffic patterns. For low-volume, long-context workloads (deep research agents, document analysis), token overhead from tools is negligible and description richness is more important. For high-volume, short-context workloads (customer support bots, automated triage), every token counts.
Quick Reference: Token Cost Benchmarks
Based on analysis of popular public MCP servers:
The Workflow: Measure, Identify, Optimize, Re-measure
Treating token overhead as a first-class concern follows the same pattern as any performance optimization:
Find out exactly how many tokens your MCP server uses
Paste any MCP server URL and get an instant per-tool breakdown — free, no sign-up required.
Further Reading
Written by Nikhil Tiwari
15+ years in product development. AI enthusiast building developer tools that make complex technologies accessible to everyone.
Related Resources