Stop Paying the MCP Tax: 96% Token Savings with One CLI
If your LLM is connected to more than 10 tools, most of your token budget is going to waste — and you probably don't even know it.
Every MCP server dumps its full tool catalog into your LLM's context on every turn. Six servers, 84 tools — that's 15,540 tokens before the conversation starts. And you pay that tax again on every single message, whether the model touches those tools or not.
Over a 20-turn conversation with 30 tools, native MCP costs 36,310 tokens just for schemas. Most of that is waste.
The Fix
mcp2cli turns any MCP server or OpenAPI spec into a CLI at runtime. No codegen, no per-API glue code.
pip install mcp2cli
# Or run directly without installing
uvx mcp2cli --help
The LLM discovers tools on demand:
# What's available? (~16 tokens/tool)
mcp2cli --mcp https://mcp.example.com/sse --list
# How does this one work? (~120 tokens, once)
mcp2cli --mcp https://mcp.example.com/sse create-task --help
# Use it (0 extra schema tokens)
mcp2cli --mcp https://mcp.example.com/sse create-task --title "Fix bug"
Instead of 3,619 tokens/turn for 30 tools, you pay 67 tokens/turn for the system prompt plus discovery costs only when needed.
The Numbers
These are actual token counts, not estimates. Measured with cl100k_base against real schemas.
MCP servers over a full conversation:
| Scenario | Turns | Tools used | Native | mcp2cli | Saved |
|---|---|---|---|---|---|
| Task manager (30 tools) | 15 | 5 | 54,525 | 2,309 | 96% |
| Multi-server (80 tools) | 20 | 8 | 193,360 | 3,897 | 98% |
| Full platform (120 tools) | 25 | 10 | 362,350 | 5,181 | 99% |
OpenAPI specs (same treatment):
| Scenario | Turns | Endpoints used | Native | mcp2cli | Saved |
|---|---|---|---|---|---|
| Small API (5 endpoints) | 10 | 3 | 3,730 | 1,199 | 68% |
| Medium API (20 endpoints) | 15 | 5 | 21,720 | 1,905 | 91% |
| Enterprise API (200 endpoints) | 25 | 10 | 358,425 | 3,925 | 99% |
120 tools, 25 turns: 357,169 tokens saved.
The Multi-Server Problem
Connect three MCP servers (task manager, filesystem, database — 60 tools) and you're burning 7,238 tokens per turn on schemas alone. Over 20 turns: 145,060 tokens. mcp2cli: 3,288 tokens. That's a 97.7% reduction.
The gap widens every turn because native MCP pays the full schema tax repeatedly while mcp2cli pays discovery costs once.
vs. Anthropic's Tool Search
Anthropic built Tool Search into their API — tools marked defer_loading: true stay out of context until searched. It cuts ~85% of upfront cost. But:
- It's Claude-API-only
- When a tool is fetched, the full JSON schema still enters context (~121 tokens/tool)
- mcp2cli's
--listcosts ~16 tokens/tool vs ~121 for fetched schemas
mcp2cli works with any LLM. Claude, GPT, Gemini, local models — it's just a CLI the model can shell out to.
AI Agent Skill
mcp2cli ships with an installable skill that teaches AI coding agents how to use it. Install it and your agent in Claude Code, Cursor, or Codex can discover and call any API without manual guidance — and even generate new skills from APIs automatically.
npx skills add knowsuchagency/mcp2cli --skill mcp2cli
Once installed, try prompts like:
mcp2cli --mcp https://mcp.example.com/sse— interact with an MCP servermcp2cli create a skill for https://api.example.com/openapi.json— generate a skill from an API
How It Works
Point it at a source and go:
# MCP over HTTP
mcp2cli --mcp https://mcp.example.com/sse search --query "test"
# MCP over stdio
mcp2cli --mcp-stdio "npx @modelcontextprotocol/server-filesystem /tmp" \
read-file --path /tmp/hello.txt
# OpenAPI spec
mcp2cli --spec https://petstore3.swagger.io/api/v3/openapi.json \
find-pets-by-status --status available
Specs and tool lists are cached with configurable TTL. No network hit on repeated runs.
Internally: load the spec or connect to the server, extract a uniform list of command definitions, build an argparse parser with typed flags, execute. Both adapters produce the same internal structure — the CLI builder and output handling are shared.
Install
pip install mcp2cli
# Or run directly without installing
uvx mcp2cli --help
This project was inspired by Kagan Yilmaz's analysis of CLI vs MCP token costs and his work on CLIHub. His observation that CLI-based tool access is dramatically more token-efficient than native MCP injection was the spark for mcp2cli.
Building AI agents or tools in Orange County? Join us at Orange County AI for events, workshops, and community.