Build an MCP Server That Cuts Claude Code Context Consumption by 98%

The Context Crisis: Why Your Claude Code Sessions Die After 30 Minutes

Every MCP tool call in Claude Code dumps raw data into your 200K context window. A Playwright snapshot costs 56 KB. Twenty GitHub issues cost 59 KB. One access log — 45 KB. After 30 minutes, 40% of your context is gone.

Cloudflare's Code Mode showed that tool definitions can be compressed by 99.9%. But nobody was solving the output side — until Context Mode. This architecture, detailed in a post by Mert Köseoğlu that hit 570 points on Hacker News, demonstrates how building an MCP server with context-mode optimization can reduce context consumption by 98%. 315 KB becomes 5.4 KB.

Let's build it ourselves.

Understanding the Architecture

Context Mode is an MCP server that sits between Claude Code and tool outputs. Instead of letting raw data (log files, API responses, snapshots) enter the conversation context, it processes them in isolated subprocesses and returns only summaries, search results, or structured metadata.

Architecture diagram description:

MCP Client (Claude Code) — sends tool calls to the MCP server
Context Mode Server — intercepts tool outputs, routes them through processing pipelines
Sandbox Layer — spawns isolated subprocesses for each execution (Python, JS, Shell, Go, etc.)
Knowledge Base — SQLite FTS5 full-text search index for chunked/processed content
Output Minifier — strips raw data, returns only the essential information (99% reduction)

The key insight: each execute call spawns an isolated subprocess with its own process boundary. Scripts can't access each other's memory or state. The subprocess runs your code, captures stdout, and only that stdout enters the conversation context. The raw data never leaves the sandbox.

Step 1: Project Setup

Create a new directory and initialize the project:

mkdir context-mode-server
cd context-mode-server
npm init -y
npm install @modelcontextprotocol/sdk zod

Create the main server file:

// server.ts
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import {
 CallToolRequestSchema,
 ListToolsRequestSchema,
} from '@modelcontextprotocol/sdk/types.js';
import { z } from 'zod';
import { execSync } from 'child_process';
const server = new Server({
name: ‘context-mode-server’,
version: ‘1.0.0’,
}, {
capabilities: { tools: {} }
});

Step 2: Build the Sandbox Executor

The sandbox is the core of context optimization. It executes code in isolated processes and returns only what's necessary:

interface SandboxResult {
 stdout: string;
 stderr: string;
 exitCode: number;
 truncated: boolean;
}
function executeInSandbox(code: string, language: string): SandboxResult {
const runtimes: Record<string, string> = {
javascript: ‘node -e’,
typescript: ‘npx tsx -e’,
python: ‘python3 -c’,
shell: ‘bash -c’,
ruby: ‘ruby -e’,
};
const runtime = runtimes[language] || runtimes.javascript;
const MAX_OUTPUT = 1024; // 1 KB max — the key optimization
try {
const output = execSync(${runtime} ${JSON.stringify(code)}, {
timeout: 10000,
maxBuffer: MAX_OUTPUT,
encoding: ‘utf-8’,
});
return {
stdout: output.slice(0, MAX_OUTPUT),
stderr: ”,
exitCode: 0,
truncated: output.length > MAX_OUTPUT,
};
} catch (err: any) {
return {
stdout: err.stdout?.slice(0, MAX_OUTPUT) || ”,
stderr: err.stderr?.slice(0, MAX_OUTPUT) || ”,
exitCode: err.status || 1,
truncated: false,
};
}
}

The critical optimization: MAX_OUTPUT = 1024. By capping output at 1 KB instead of letting 56 KB Playwright snapshots or 59 KB GitHub issue lists through, you save 98%+ of context.

Step 3: Build the Knowledge Base with SQLite FTS5

The knowledge base indexes content by headings and makes it searchable via BM25 ranking:

import Database from 'better-sqlite3';
class KnowledgeBase {
private db: Database.Database;
constructor(path: string) {
this.db = new Database(path);
this.db.exec( CREATE VIRTUAL TABLE IF NOT EXISTS chunks USING fts5(  title, heading, content,   tokenize='porter unicode61'  ); );
}
indexContent(title: string, markdown: string) {
// Chunk by headings, preserve code blocks
const chunks = markdown.split(/(?=^## )/m);
const insert = this.db.prepare(
‘INSERT INTO chunks (title, heading, content) VALUES (?, ?, ?)’
);
for (const chunk of chunks) {
const heading = chunk.match(/^## (.+)$/m)?.[1] || ‘Intro’;
insert.run(title, heading, chunk);
}
}
search(query: string, limit = 5) {
// BM25 ranking with Porter stemming
return this.db.prepare(
SELECT title, heading, snippet(chunks, 2, '&lt;b&gt;', '&lt;/b&gt;', '...', 32) AS snippet  FROM chunks  WHERE chunks MATCH ?  ORDER BY rank  LIMIT ?
).all(query, limit);
}
}

Step 4: Register the Tools

Expose the tools through MCP's tool registration:

server.setRequestHandler(ListToolsRequestSchema, async () => ({
 tools: [
 {
 name: 'batch_execute',
 description: 'Execute code in a sandboxed environment. Supports JS, TS, Python, Shell, Ruby, Go, Rust.',
 inputSchema: {
 type: 'object',
 properties: {
 code: { type: 'string', description: 'Code to execute' },
 language: { type: 'string', enum: ['javascript', 'typescript', 'python', 'shell'] },
 },
 required: ['code', 'language'],
 },
 },
 {
 name: 'search',
 description: 'Search indexed knowledge base using BM25 full-text search.',
 inputSchema: {
 type: 'object',
 properties: {
 query: { type: 'string', description: 'Search query' },
 limit: { type: 'number', default: 5 },
 },
 required: ['query'],
 },
 },
 {
 name: 'fetch_and_index',
 description: 'Fetch a URL, convert to markdown, chunk, and index it. Returns summary only.',
 inputSchema: {
 type: 'object',
 properties: {
 url: { type: 'string' },
 },
 required: ['url'],
 },
 },
 ],
}));
server.setRequestHandler(CallToolRequestSchema, async (request) => {
const { name, arguments: args } = request.params;
switch (name) {
case ‘batch_execute’: {
const result = executeInSandbox(args.code, args.language);
return {
content: [{
type: ‘text’,
text: JSON.stringify({
stdout: result.stdout,
exitCode: result.exitCode,
truncated: result.truncated,
}),
}],
};
}
case ‘search’: {
const results = kb.search(args.query, args.limit);
return {
content: [{
type: ‘text’,
text: JSON.stringify(results),
}],
};
}
default:
throw new Error(Unknown tool: ${name});
}
});

Step 5: Wire Everything Together

const transport = new StdioServerTransport();
await server.connect(transport);
// Start the server
const kb = new KnowledgeBase(’./kb.db’);
console.error(‘Context Mode MCP server running on stdio’);

Build and start:

npx tsc
claude mcp add context-mode -- node dist/server.js

What Changes in Practice

With this MCP server, your Claude Code sessions transform:

Playwright snapshot: 56 KB → 299 B (99.5% reduction)
GitHub issues (20): 59 KB → 1.1 KB (98.1% reduction)
Access log (500 requests): 45 KB → 155 B (99.7% reduction)
Analytics CSV (500 rows): 85 KB → 222 B (99.7% reduction)
Git log (153 commits): 11.6 KB → 107 B (99.1% reduction)

Over a full session: 315 KB of raw output becomes 5.4 KB. Session time before context slowdown goes from ~30 minutes to ~3 hours. Context remaining after 45 minutes: 99% instead of 60%.

The magic isn't in the sandbox itself — it's in the constraint. By forcing tools to return structured summaries instead of raw data, you preserve context for what matters: reasoning, planning, and debugging. Your 200K context window feels 20x larger.

Deploy and Test

Run the server and test with a Claude Code session:

# Start in MCP mode
claude mcp add context-mode -- node dist/server.js
Or use the plugin marketplace
/plugin marketplace add mksglu/claude-context-mode
/plugin install context-mode@claude-context-mode
Restart Claude Code
Then: ask it to analyze a CSV, review a log, fetch docs
Every tool call will return < 1 KB of structured output

The Context Mode approach is open source (MIT) and available at github.com/mksglu/claude-context-mode. The concepts here — sandboxed execution, BM25 search indexing, aggressive output truncation — are architecture patterns you can apply to any MCP server you build.

Based on the Context Mode architecture by Mert Köseoğlu (mksg.lu/blog/context-mode), 570 points on Hacker News.

📖 Related Reads

Hermes Tutorials — Hermes Agent setup, configuration, and advanced workflows
ToolBrain — tool reviews, LLM comparisons, and AI workflow guides
NiteAgent — AI agent development, frameworks, and production patterns

Cross-links automatically generated from ToolBrain.

← Back to all posts

Build an MCP Server That Cuts Claude Code Context Consumption by 98%

The Context Crisis: Why Your Claude Code Sessions Die After 30 Minutes

Understanding the Architecture

Step 1: Project Setup

Step 2: Build the Sandbox Executor

Step 3: Build the Knowledge Base with SQLite FTS5

Step 4: Register the Tools

Step 5: Wire Everything Together

What Changes in Practice

Deploy and Test

Or use the plugin marketplace

Restart Claude Code

Then: ask it to analyze a CSV, review a log, fetch docs

Every tool call will return < 1 KB of structured output

📖 Related Reads

Related Posts

Guide: Building AI-Powered Development Workflows with MCP and Agentic Pipelines

Build Log: Building a Custom MCP Code Review Server With FastMCP and Python

ChatDev Review 2026: OpenBMB's 33K★ Zero-Code Multi-Agent Platform That Democratizes AI Orchestration

zilliztech/claude-context Review 2026 — Semantic Code Search MCP for AI Agents