Stop Paying the MCP Tax: 96% Token Savings with One CLI

Stop Paying the MCP Tax: 96% Token Savings with One CLI

If your LLM is connected to more than 10 tools, most of your token budget is going to waste — and you probably don't even know it.

Every MCP server dumps its full tool catalog into your LLM's context on every turn. Six servers, 84 tools — that's 15,540 tokens before the conversation starts. And you pay that tax again on every single message, whether the model touches those tools or not.

Over a 20-turn conversation with 30 tools, native MCP costs 36,310 tokens just for schemas. Most of that is waste.

The Fix

mcp2cli turns any MCP server or OpenAPI spec into a CLI at runtime. No codegen, no per-API glue code.

pip install mcp2cli

# Or run directly without installing
uvx mcp2cli --help

The LLM discovers tools on demand:

# What's available? (~16 tokens/tool)
mcp2cli --mcp https://mcp.example.com/sse --list

# How does this one work? (~120 tokens, once)
mcp2cli --mcp https://mcp.example.com/sse create-task --help

# Use it (0 extra schema tokens)
mcp2cli --mcp https://mcp.example.com/sse create-task --title "Fix bug"

Instead of 3,619 tokens/turn for 30 tools, you pay 67 tokens/turn for the system prompt plus discovery costs only when needed.

The Numbers

These are actual token counts, not estimates. Measured with cl100k_base against real schemas.

MCP servers over a full conversation:

ScenarioTurnsTools usedNativemcp2cliSaved
Task manager (30 tools)15554,5252,30996%
Multi-server (80 tools)208193,3603,89798%
Full platform (120 tools)2510362,3505,18199%

OpenAPI specs (same treatment):

ScenarioTurnsEndpoints usedNativemcp2cliSaved
Small API (5 endpoints)1033,7301,19968%
Medium API (20 endpoints)15521,7201,90591%
Enterprise API (200 endpoints)2510358,4253,92599%

120 tools, 25 turns: 357,169 tokens saved.

The Multi-Server Problem

Connect three MCP servers (task manager, filesystem, database — 60 tools) and you're burning 7,238 tokens per turn on schemas alone. Over 20 turns: 145,060 tokens. mcp2cli: 3,288 tokens. That's a 97.7% reduction.

The gap widens every turn because native MCP pays the full schema tax repeatedly while mcp2cli pays discovery costs once.

Anthropic built Tool Search into their API — tools marked defer_loading: true stay out of context until searched. It cuts ~85% of upfront cost. But:

  • It's Claude-API-only
  • When a tool is fetched, the full JSON schema still enters context (~121 tokens/tool)
  • mcp2cli's --list costs ~16 tokens/tool vs ~121 for fetched schemas

mcp2cli works with any LLM. Claude, GPT, Gemini, local models — it's just a CLI the model can shell out to.

AI Agent Skill

mcp2cli ships with an installable skill that teaches AI coding agents how to use it. Install it and your agent in Claude Code, Cursor, or Codex can discover and call any API without manual guidance — and even generate new skills from APIs automatically.

npx skills add knowsuchagency/mcp2cli --skill mcp2cli

Once installed, try prompts like:

  • mcp2cli --mcp https://mcp.example.com/sse — interact with an MCP server
  • mcp2cli create a skill for https://api.example.com/openapi.json — generate a skill from an API

How It Works

Point it at a source and go:

# MCP over HTTP
mcp2cli --mcp https://mcp.example.com/sse search --query "test"

# MCP over stdio
mcp2cli --mcp-stdio "npx @modelcontextprotocol/server-filesystem /tmp" \
  read-file --path /tmp/hello.txt

# OpenAPI spec
mcp2cli --spec https://petstore3.swagger.io/api/v3/openapi.json \
  find-pets-by-status --status available

Specs and tool lists are cached with configurable TTL. No network hit on repeated runs.

Internally: load the spec or connect to the server, extract a uniform list of command definitions, build an argparse parser with typed flags, execute. Both adapters produce the same internal structure — the CLI builder and output handling are shared.

Install

pip install mcp2cli

# Or run directly without installing
uvx mcp2cli --help

GitHub · PyPI


This project was inspired by Kagan Yilmaz's analysis of CLI vs MCP token costs and his work on CLIHub. His observation that CLI-based tool access is dramatically more token-efficient than native MCP injection was the spark for mcp2cli.

Building AI agents or tools in Orange County? Join us at Orange County AI for events, workshops, and community.