📖 TL;DR — Key Takeaways

Every tool definition your MCP server exposes is included in the AI's system prompt on every single request
A server with 50 tools easily consumes 10,000–20,000 tokens before your actual prompt begins
This silently inflates your API costs and can push your real content out of the context window
The fix: measure first, then trim descriptions, split servers, or filter tools per agent
MCP Token Counter — paste a server URL and get an instant per-tool breakdown, free

You connected an MCP server. Your agent is working. Everything looks fine. But somewhere in the background, your AI budget is draining faster than it should — and the culprit isn't your conversation history or your documents. It's the tools themselves.

This is one of the most overlooked costs in MCP-based AI systems, and it compounds silently as you add more tools to your server.

How MCP Tool Definitions Consume Tokens

When an AI model supports tool use, it needs to understand what tools are available before it can decide which one to call. This understanding comes from a schema — a structured description of each tool's name, purpose, and parameters.

In the MCP protocol, this schema is delivered automatically. When your client calls tools/list at session start, the server returns a JSON array of every tool it exposes. The MCP client then forwards these definitions to the language model as part of the request payload — on every single message in the conversation.

Here is what a single tool definition looks like when serialized:

{
  "name": "search_repositories",
  "description": "Search GitHub repositories by keyword, language, or topic.
    Returns repository name, description, star count, primary language,
    last updated date, and clone URL. Supports pagination with page and
    per_page parameters. Use sort to order by stars, forks, or
    recently updated.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query with GitHub search syntax support"
      },
      "language": {
        "type": "string",
        "description": "Filter by programming language"
      },
      "sort": {
        "type": "string",
        "enum": ["stars", "forks", "updated"],
        "description": "Sort order for results"
      },
      "page": { "type": "number", "description": "Page number (default: 1)" },
      "per_page": { "type": "number", "description": "Results per page, max 100" }
    },
    "required": ["query"]
  }
}

That single tool definition is approximately 280–320 tokens. Now multiply by 40 tools in a typical GitHub MCP server and you have around 12,000 tokens of overhead — before a single word of your actual prompt is processed.

Why This Matters

💸 Direct Cost

Input tokens are billed. 10,000 extra tokens per request × 1,000 daily requests = 10M extra input tokens per day. At Sonnet 4.6 rates ($3/1M), that's $30/day in pure overhead.

📉 Reduced Useful Context

Most models have a 128k–200k context window. If 15,000 tokens are consumed by tool definitions, that's 15,000 fewer tokens available for your documents, conversation history, and instructions.

🐢 Latency

Larger prompts take longer to process, even with fast models. In long multi-turn conversations where tool overhead accumulates, this adds measurable latency to every response.

🤯 Model Confusion

Some research suggests that presenting a model with 50+ tools degrades tool selection quality. Too many choices leads to worse decisions — especially when many tools are similar.

The Context Window Math

Let's break down where a typical 128,000-token context window actually goes in an MCP-powered agent:

Budget Item	Typical Tokens	% of 128k
System prompt + instructions	1,000–3,000	~2%
Tool definitions (10 tools, lean)	~2,000	~2%
Tool definitions (50 tools, verbose)	~15,000	~12%
Conversation history (20 turns)	5,000–20,000	4–16%
Documents / RAG context	10,000–60,000	8–47%
Remaining for current message	Varies widely	Varies widely

The key insight: tool definitions are a fixed cost per request. Unlike conversation history which accumulates, tool overhead doesn't grow — but it also never goes away. Optimizing it is a one-time investment with permanent returns.

Real scenario: a developer's wake-up call

A developer built an internal agent connecting 3 MCP servers: Supabase (22 tools), GitHub (41 tools), and Linear (18 tools). That's 81 tools. At roughly 250 tokens per tool, 20,000+ tokens were consumed before any message was processed — 16% of a 128k window gone, on every single API call.

Measuring Your MCP Server's Token Footprint

Before you can optimize, you need to know where you stand. The MCP Token Counter at MCP Playground does this in seconds — no sign-up, no installation.

How to use it:

Paste your MCP server URL

Supports both Streamable HTTP (/mcp) and legacy SSE (/sse) transports

Add an auth token if needed

For private servers requiring Bearer authentication — the token stays in your browser and is only used for this single request

Click Analyze Tokens

The tool connects to your server, fetches the tool list, estimates token usage per tool, and ranks them from largest to smallest

Read your results

You get a total token estimate, tool count, average tokens per tool, a visual budget bar against a 16k baseline, and a per-tool ranked breakdown

The color coding gives you instant signal:

Green (under 4,000 tokens) — healthy, your tool footprint is well-managed
Amber (4,000–10,000 tokens) — worth reviewing; trimming could meaningfully reduce spend
Red (10,000+ tokens) — high overhead; optimization is strongly recommended

5 Strategies to Reduce MCP Token Overhead

Strategy 1 — Trim Verbose Tool Descriptions

Tool descriptions are the biggest variable in token cost. Developers often write them as full paragraphs of documentation, but models only need enough to make a routing decision. Compare these two descriptions for the same tool:

❌ Verbose (~80 tokens)

"description": "Searches the repository
for files matching a given glob pattern.
This tool is useful when you need to find
files by their name or extension across
the entire repository tree. It supports
standard glob wildcards including * and **
for recursive matching. Returns a list of
matching file paths relative to the
repository root."

✅ Lean (~20 tokens)

"description": "Find files by glob
pattern. Returns matching paths."

That's a 75% reduction for a single description. Across 40 tools, trimming descriptions from 80 tokens to 20 tokens saves approximately 2,400 tokens per request.

Rule of thumb

A description should answer: "When would the model choose this tool over other tools?" Everything beyond that is overhead. Keep descriptions under 15 words wherever the function name is already self-explanatory.

Strategy 2 — Trim Parameter Descriptions

Parameter descriptions are often as bloated as tool descriptions — and they multiply by the number of parameters per tool. If a parameter is named user_id with type string, the description "The unique identifier of the user whose data you want to retrieve" adds tokens without adding clarity.

For parameters with obvious names, the description can be as short as the type and a one-clause constraint:

// Before: ~25 tokens
"description": "The unique identifier of the repository in owner/name format"

// After: ~10 tokens
"description": "Repository in owner/name format"

Strategy 3 — Split into Focused Servers

Instead of one server with 60 tools covering every operation your platform supports, build smaller purpose-specific servers:

github-read.mcp

list_repos, get_file, list_issues, list_prs — read-only, 8 tools

github-write.mcp

create_pr, push_commit, add_comment — write ops, 6 tools

github-search.mcp

search_code, search_issues, search_users — discovery ops, 5 tools

An agent that only needs to read from GitHub connects to the read server alone — and pays for 8 tools instead of 40. For agents that need everything, you can connect multiple servers but you now have visibility into the exact cost of each capability layer.

Strategy 4 — Filter Tools per Agent at Runtime

Some MCP server implementations support tool filtering — you can pass a list of tool names to allow or deny at session initialization. If your framework supports it, pass only the tools relevant to the current task:

// Claude Desktop config — restrict to specific tools
{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "..." },
      "allowedTools": ["list_issues", "create_issue", "add_comment"]
    }
  }
}

This is particularly effective for specialized agents: a "triage bot" that only creates and comments on issues doesn't need the 35 other GitHub tools in its context.

Strategy 5 — Remove or Stub Rarely Used Tools

The token counter's per-tool ranking makes this obvious: sort by token cost, then look at the top 20%. These are usually tools with the longest descriptions and most complex schemas. For each one, ask: how often does the AI actually call this?

If a tool is called less than 1% of the time but costs 500+ tokens per request, it's a candidate for removal or stubbing (replacing with a minimal no-op that can be re-enabled for specific sessions).

Tradeoffs to Keep in Mind

Optimization has limits

Too-short descriptions hurt routing. If the model can't distinguish two similar tools from their names alone, it needs the description. Under-describing tools leads to wrong tool selection, which is worse than the token cost.
Splitting servers adds connection overhead. Each MCP connection has a setup cost. Beyond 3–4 concurrent servers in a session, connection management complexity starts to outweigh token savings.
Prompt caching offsets the cost. Anthropic's prompt cache and similar features cache the tool definitions portion of the prompt. At high request volume, the marginal cost of tool tokens approaches zero for repeated sessions with the same tools.

The right optimization strategy depends on your traffic patterns. For low-volume, long-context workloads (deep research agents, document analysis), token overhead from tools is negligible and description richness is more important. For high-volume, short-context workloads (customer support bots, automated triage), every token counts.

Quick Reference: Token Cost Benchmarks

Based on analysis of popular public MCP servers:

MCP Server Type	Typical Tools	Est. Tokens	Level
Simple utility (calculator, formatter)	3–8	500–1,500	Low
Database / Supabase MCP	15–25	3,000–6,000	Medium
GitHub MCP (full)	35–45	8,000–14,000	High
Filesystem MCP	10–15	2,000–4,000	Low
3 servers connected simultaneously	60–100+	15,000–30,000+	High

The Workflow: Measure, Identify, Optimize, Re-measure

Treating token overhead as a first-class concern follows the same pattern as any performance optimization:

1. Measure — run the MCP Token Counter on your server before making any changes

2. Identify — find the top 3 tools by token cost and check if their descriptions can be shortened without losing routing clarity

3. Optimize — trim descriptions, remove unused tools, or split your server if total tokens exceed 8,000

4. Re-measure — run the counter again to confirm the reduction, then test tool selection quality with real prompts

5. Repeat — revisit whenever you add new tools; it's easy for a server to grow back into the red

Find out exactly how many tokens your MCP server uses

Paste any MCP server URL and get an instant per-tool breakdown — free, no sign-up required.

Analyze My Server → Test MCP Server

MCP Token Counter: Why Your Tools Are Silently Eating Your Context Window