Stop Paying the MCP Tax: 96% Token Savings with One CLI
If your LLM is connected to more than 10 tools, most of your token budget is going to waste — and you probably don't even know it.
Every MCP server dumps its full tool catalog into your LLM's context on every turn. Six servers, 84 tools — that's 15,540 tokens before the conversation starts. And you pay that tax again on every single message, whether the model touches those tools or not.
Over a 20-turn conversation with 30 tools, native MCP costs 36,310 tokens just for schemas. Most of that is waste.
The Fix
mcp2cli turns any MCP server, OpenAPI spec, or GraphQL endpoint into a CLI at runtime. No codegen, no per-API glue code.
# Run directly without installing
uvx mcp2cli --help
# Or install globally
uv tool install mcp2cli
The LLM discovers tools on demand:
# What's available? (~16 tokens/tool)
mcp2cli --mcp https://mcp.example.com/sse --list
# How does this one work? (~120 tokens, once)
mcp2cli --mcp https://mcp.example.com/sse create-task --help
# Use it (0 extra schema tokens)
mcp2cli --mcp https://mcp.example.com/sse create-task --title "Fix bug"
Instead of 3,619 tokens/turn for 30 tools, you pay 67 tokens/turn for the system prompt plus discovery costs only when needed.
The Numbers
These are actual token counts, not estimates. Measured with cl100k_base against real schemas, verified by an automated test suite.
What mcp2cli actually costs
Let's be upfront about what mcp2cli adds to context. It's not zero — it's just dramatically less than injecting full schemas.
| Component | Cost | When |
|---|---|---|
| System prompt | 67 tokens | Every turn (fixed) |
--list output |
~16 tokens/tool | Once per conversation |
--help output |
~80-200 tokens/tool | Once per unique tool used |
| Tool call output | same as native | Per call |
The --list cost scales linearly with the number of tools — 30 tools costs ~464 tokens, 120 tools costs ~1,850 tokens. This is still 7-8x cheaper than the full schemas, and you only pay it once.
Compare that to native MCP injection: ~121 tokens per tool, every single turn, whether the model uses those tools or not. For OpenAPI endpoints, it's ~72 tokens per endpoint per turn.
Over a full conversation
Here's the total token cost across a realistic multi-turn conversation. The mcp2cli column includes all overhead: the system prompt on every turn, one --list discovery, --help for each unique tool the LLM actually uses, and tool call outputs.
MCP servers:
| Scenario | Turns | Unique tools used | Native total | mcp2cli total | Saved |
|---|---|---|---|---|---|
| Task manager (30 tools) | 15 | 5 | 54,525 | 2,309 | 96% |
| Multi-server (80 tools) | 20 | 8 | 193,360 | 3,897 | 98% |
| Full platform (120 tools) | 25 | 10 | 362,350 | 5,181 | 99% |
OpenAPI specs:
| Scenario | Turns | Unique endpoints used | Native total | mcp2cli total | Saved |
|---|---|---|---|---|---|
| Small API (5 endpoints) | 10 | 3 | 3,730 | 1,199 | 68% |
| Medium API (20 endpoints) | 15 | 5 | 21,720 | 1,905 | 91% |
| Large API (50 endpoints) | 20 | 8 | 71,940 | 2,810 | 96% |
| Enterprise API (200 endpoints) | 25 | 10 | 358,425 | 3,925 | 99% |
A 120-tool MCP platform over 25 turns: 357,169 tokens saved.
Turn-by-turn: watching the gap widen
Here's a 30-tool MCP server over 10 turns. The mcp2cli column includes the real costs: --list discovery on turn 1, --help + tool output when each new tool is first used.
Turn Native mcp2cli Savings
──────────────────────────────────────────────────────────
1 3,619 531 3,088 <- --list (464 tokens)
2 7,238 598 6,640
3 10,887 815 10,072 <- --help (120) + tool call
4 14,506 882 13,624
5 18,155 1,099 17,056 <- --help (120) + tool call
6 21,774 1,166 20,608
7 25,423 1,383 24,040 <- --help (120) + tool call
8 29,042 1,450 27,592
9 32,691 1,667 31,024 <- --help (120) + tool call
10 36,310 1,734 34,576
Total: 34,576 tokens saved (95.2%)
Why the gap is so large
Native MCP approach — pay the full schema tax on every turn:
System prompt: "You have these 30 tools: [3,619 tokens of JSON schemas]"
-> 3,619 tokens consumed per turn, whether used or not
-> 10 turns = 36,310 tokens
mcp2cli approach — pay only for what you use:
System prompt: "Use mcp2cli --mcp <url> <command> [--flags]" (67 tokens/turn)
-> mcp2cli --mcp <url> --list (464 tokens, once)
-> mcp2cli --mcp <url> create-task --help (120 tokens, once per tool)
-> mcp2cli --mcp <url> create-task --title "Fix bug" (0 extra tokens)
-> 10 turns, 4 unique tools = 1,734 tokens
The LLM discovers what it needs, when it needs it. Everything else stays out of context.
The multi-server problem
This is where it really hurts. Connect 3 MCP servers (a task manager, a filesystem server, and a database server — 60 tools total) and you're paying 7,238 tokens per turn. Over a 20-turn conversation, that's 145,060 tokens just for tool schemas. mcp2cli reduces that to 3,288 tokens — a 97.7% reduction — even after accounting for --list discovery (928 tokens) and --help for 6 unique tools (720 tokens).
vs. Anthropic's Tool Search
Anthropic recognized the tool sprawl problem and built Tool Search directly into their API — a deferred-loading pattern where tools are marked defer_loading: true and Claude discovers them via a search index (~500 tokens) instead of loading all schemas upfront. It typically cuts token usage by 85%. But:
- It's Claude-API-only. mcp2cli works with any LLM — Claude, GPT, Gemini, local models — because it's just a CLI tool the model can shell out to.
- Full schemas still enter context. When Tool Search fetches a tool, the full JSON schema still gets injected (~121 tokens/tool). mcp2cli's
--helpreturns human-readable text that's typically cheaper, and--listsummaries cost ~16 tokens/tool vs ~121. - No codegen, no recompilation. Point mcp2cli at a spec URL or MCP server and the CLI exists immediately. When the server adds new endpoints, they appear on the next invocation.
- OpenAPI and GraphQL support. MCP isn't the only schema-rich protocol. mcp2cli handles OpenAPI specs and GraphQL endpoints with the same CLI interface, the same caching, and the same on-demand discovery.
How It Works
- Load — Fetch the OpenAPI spec, connect to the MCP server, or introspect the GraphQL endpoint. Resolve
$refs. Cache for reuse. - Extract — Walk the spec paths/tools and produce a uniform list of command definitions with typed parameters.
- Build — Generate an argparse parser with subcommands, flags, types, choices, and help text.
- Execute — Dispatch the parsed args as an HTTP request (OpenAPI), tool call (MCP), or GraphQL query.
All adapters produce the same internal CommandDef structure, so the CLI builder and output handling are shared across all four modes.
AI Agent Skill
mcp2cli ships with an installable skill that teaches AI coding agents (Claude Code, Cursor, Codex) how to use it. Once installed, your agent can discover and call any MCP server or OpenAPI endpoint — and even generate new skills from APIs.
npx skills add knowsuchagency/mcp2cli --skill mcp2cli
Once installed, try prompts like: - mcp2cli --mcp https://mcp.example.com/sse — interact with an MCP server - mcp2cli create a skill for https://api.example.com/openapi.json — generate a skill from an API
Usage Reference
Everything below is hands-on usage: source modes, authentication, tool filtering, output control, and caching.
Four Ways In
mcp2cli works with four different API protocols through a single, consistent CLI interface.
MCP over HTTP/SSE
Connect to remote MCP servers over HTTP. mcp2cli supports both the newer Streamable HTTP transport and the original SSE transport, and tries the right one automatically.
# List tools from an MCP server
mcp2cli --mcp https://mcp.example.com/sse --list
# Call a tool
mcp2cli --mcp https://mcp.example.com/sse search --query "test"
# Force a specific transport (skip the auto-detection dance)
mcp2cli --mcp https://mcp.example.com/sse --transport sse --list
MCP over stdio
Launch a local MCP server as a subprocess. mcp2cli handles spawning, communication, and cleanup.
# List tools from a local server
mcp2cli --mcp-stdio "npx @modelcontextprotocol/server-filesystem /tmp" --list
# Call a tool
mcp2cli --mcp-stdio "npx @modelcontextprotocol/server-filesystem /tmp" \
read-file --path /tmp/hello.txt
# Pass environment variables to the server process
mcp2cli --mcp-stdio "node server.js" --env API_KEY=sk-... --env DEBUG=1 \
search --query "test"
OpenAPI
Point mcp2cli at any OpenAPI spec — JSON or YAML, local or remote — and every endpoint becomes a CLI subcommand.
# List all commands from a remote spec
mcp2cli --spec https://petstore3.swagger.io/api/v3/openapi.json --list
# Call an endpoint
mcp2cli --spec ./openapi.json --base-url https://api.example.com \
list-pets --status available
# POST with JSON body from stdin
echo '{"name": "Fido", "tag": "dog"}' | mcp2cli --spec ./spec.json create-pet --stdin
# Local YAML spec
mcp2cli --spec ./api.yaml --base-url http://localhost:8000 --list
GraphQL
Point mcp2cli at any GraphQL endpoint and it introspects the schema, discovers queries and mutations, auto-generates selection sets, and constructs parameterized queries with proper variable declarations.
# List all queries and mutations
mcp2cli --graphql https://api.example.com/graphql --list
# Call a query
mcp2cli --graphql https://api.example.com/graphql users --limit 10
# Call a mutation
mcp2cli --graphql https://api.example.com/graphql create-user \
--name "Alice" --email "[email protected]"
# Override auto-generated selection set fields
mcp2cli --graphql https://api.example.com/graphql users --fields "id name email"
Here's what that looks like in practice. Given this schema:
type Query {
users: [User!]!
user(id: ID!): User
}
type Mutation {
createUser(name: String!, email: String!, age: Int): User
deleteUser(id: ID!): Boolean
}
type User {
id: ID!
name: String!
email: String
age: Int
status: Status
}
enum Status { ACTIVE INACTIVE BANNED }
mcp2cli generates:
$ mcp2cli --graphql https://api.example.com/graphql --list
query:
users List all users
user Get a user by ID
mutation:
create-user Create a new user
delete-user Delete a user by ID
$ mcp2cli --graphql https://api.example.com/graphql create-user --help
usage: mcp2cli create-user [--name NAME] [--email EMAIL] [--age AGE]
--name User name (String!, required)
--email User email (String!, required)
--age User age (Int)
$ mcp2cli --graphql https://api.example.com/graphql create-user \
--name "Alice" --email "[email protected]"
{"id": "4", "name": "Alice", "email": "[email protected]", "age": null, "status": null}
No SDL parsing, no code generation — just point and run.
Authentication & Secrets
Auth headers
Add authentication headers to any request. The --auth-header flag is repeatable for multiple headers:
mcp2cli --mcp https://mcp.example.com/sse \
--auth-header "x-api-key:sk-..." \
query --sql "SELECT 1"
Secrets from environment or files
Sensitive values support env: and file: prefixes to avoid passing secrets as CLI arguments (which are visible in process listings):
# Read from environment variable
mcp2cli --mcp https://mcp.example.com/sse \
--auth-header "Authorization:env:MY_API_TOKEN" --list
# Read from file
mcp2cli --mcp https://mcp.example.com/sse \
--oauth-client-secret "file:/run/secrets/client_secret" \
--oauth-client-id "my-client-id" --list
# Works with secret managers that inject env vars
fnox exec -- mcp2cli --mcp https://mcp.example.com/sse \
--oauth-client-id "env:OAUTH_CLIENT_ID" \
--oauth-client-secret "env:OAUTH_CLIENT_SECRET" --list
OAuth
MCP servers that require OAuth are supported out of the box. mcp2cli handles token acquisition, caching, and refresh automatically.
# Authorization code + PKCE flow (opens browser for login)
mcp2cli --mcp https://mcp.example.com/sse --oauth --list
# Client credentials flow (machine-to-machine, no browser)
mcp2cli --mcp https://mcp.example.com/sse \
--oauth-client-id "my-client-id" \
--oauth-client-secret "my-secret" \
search --query "test"
# With specific scopes
mcp2cli --mcp https://mcp.example.com/sse --oauth --oauth-scope "read write" --list
Tokens are persisted in ~/.cache/mcp2cli/oauth/ so subsequent calls reuse existing tokens and refresh automatically when they expire.
Tool Discovery & Search
Every source mode supports the same discovery workflow:
# List all available commands
mcp2cli --mcp https://mcp.example.com/sse --list
# Get detailed help for a specific command
mcp2cli --mcp https://mcp.example.com/sse create-task --help
# Search tools by name or description (case-insensitive substring match)
mcp2cli --mcp https://mcp.example.com/sse --search "task"
mcp2cli --spec ./openapi.json --search "create"
mcp2cli --graphql https://api.example.com/graphql --search "user"
--search implies --list — it filters the listing to matching results.
Bake Mode — Saved Configurations
Tired of repeating --spec/--mcp/--mcp-stdio plus auth flags on every invocation? Bake them into a named configuration:
# Create a baked tool from an OpenAPI spec
mcp2cli bake create petstore --spec https://api.example.com/spec.json \
--exclude "delete-*,update-*" --methods GET,POST --cache-ttl 7200
# Create a baked tool from an MCP stdio server
mcp2cli bake create mygit --mcp-stdio "npx @mcp/github" \
--include "search-*,list-*" --exclude "delete-*"
# Use a baked tool with @ prefix — no connection flags needed
mcp2cli @petstore --list
mcp2cli @petstore list-pets --limit 10
mcp2cli @mygit search-repos --query "rust"
# Manage baked tools
mcp2cli bake list # show all baked tools
mcp2cli bake show petstore # show config (secrets masked)
mcp2cli bake update petstore --cache-ttl 3600
mcp2cli bake remove petstore
Filtering options let you control which tools are exposed:
--include— comma-separated glob patterns to whitelist tools (e.g."list-*,get-*")--exclude— comma-separated glob patterns to blacklist tools (e.g."delete-*")--methods— comma-separated HTTP methods to allow (e.g."GET,POST", OpenAPI only)
Installing wrapper scripts
bake install creates a standalone shell script so you can invoke the baked tool directly, without the mcp2cli @ prefix:
# Install to ~/.local/bin (default)
mcp2cli bake install petstore
petstore --list
# Install to a custom directory
mcp2cli bake install petstore --dir ./scripts/
./scripts/petstore --list
Configs are stored in ~/.config/mcp2cli/baked.json.
Output Control
# Pretty-print JSON (also auto-enabled for TTY)
mcp2cli --spec ./spec.json --pretty list-pets
# Raw response body (no JSON parsing)
mcp2cli --spec ./spec.json --raw get-data
# Pipe-friendly (compact JSON when not a TTY)
mcp2cli --spec ./spec.json list-pets | jq '.[] | .name'
# TOON output — token-efficient encoding for LLM consumption
# Best for large uniform arrays (40-60% fewer tokens than JSON)
mcp2cli --mcp https://mcp.example.com/sse --toon list-tags
Caching
Specs and MCP tool lists are cached in ~/.cache/mcp2cli/ with a 1-hour TTL by default. Local file specs are never cached.
# Force refresh
mcp2cli --spec https://api.example.com/spec.json --refresh --list
# Custom TTL (seconds)
mcp2cli --spec https://api.example.com/spec.json --cache-ttl 86400 --list
# Custom cache key
mcp2cli --spec https://api.example.com/spec.json --cache-key my-api --list
# Override cache directory
MCP2CLI_CACHE_DIR=/tmp/my-cache mcp2cli --spec ./spec.json --list
This project was inspired by Kagan Yilmaz's analysis of CLI vs MCP token costs and his work on CLIHub. His observation that CLI-based tool access is dramatically more token-efficient than native MCP injection was the spark for mcp2cli.
Building AI agents or tools in Orange County? Join us at Orange County AI for events, workshops, and community.