Building MCP Servers: Patterns and Lessons

Distilled from building 6 MCP servers across art archives, legal document management, and analytics dashboards.

Mar 09, 2026

The Big Picture

MCP server installation is now copy-paste. One URL, pasted into Claude.ai or Claude Desktop, and an ordinary user — not a developer — has radically extended what their AI can do. Search a document corpus. Explore financial data. Submit files into a processing pipeline. Query a database. All without leaving their conversation.

This is going to take off. The barrier between “I built a tool” and “anyone can use it” has collapsed to a single line of text. What used to require API integrations, custom UIs, and developer onboarding is now: paste this URL, start talking.

What follows is what we’ve learned about designing for this world — where the user is Claude, the UX is a tool schema, and the onboarding is a JSON response.

1. Connection UX: The One-Line Problem

The single biggest adoption barrier for MCP servers is connection setup. Every extra step between “I want to use this” and “it works” loses users.

What we learned

Token-in-URL is the sweet spot. Claude.ai Custom Connectors accept a URL — that’s it. If your auth token lives in the URL as a query parameter (?token=xyz), the entire setup is a single copy-paste:

https://your-server.onrender.com/mcp/stream?token=abc123

Compare this to requiring separate header configuration, OAuth flows, or API key setup screens. One line. Done.

Bearer tokens work too, but cost a step. Claude Desktop and the SDK support Authorization: Bearer headers, but now you need two fields (URL + header) instead of one. We support both — token-in-URL for quick setup, Bearer for clients that prefer it.

Basic Auth is fine for internal tools. Some of our dashboard servers use Basic Auth because they’re accessed by a small known team, and the auth credentials are managed in Render’s environment. For tools where the admin controls both sides, this is simple and sufficient.

Recommendation

Support multiple auth methods, but optimise the default path for one-line setup:

?token= → Claude.ai Custom Connectors (primary)
Bearer → Claude Desktop, SDK integrations
Basic → Internal/admin tools

2. Transport: Stdio vs HTTP

MCP supports two transports. We use both.

Stdio (local development)

Process spawned by Claude Desktop via command + args
No network, no auth needed — the OS process boundary is the security model
Fast iteration: edit code, restart, test
Limited to local machines

HTTP (production)

Two flavours:

JSON-RPC over POST — Simple. One endpoint, stateless request-response. Works with any HTTP client. Our Python servers use this exclusively. Easy to debug with curl.

Streamable HTTP — The official MCP SDK transport for Claude.ai. Uses the StreamableHTTPServerTransport class. Required for Custom Connectors on claude.ai. Our Node.js servers support both: /mcp for JSON-RPC (bridge/Claude Desktop) and /mcp/stream for Streamable HTTP (Claude.ai).

Recommendation

If you’re building a new server: start with JSON-RPC over POST. It’s simpler to implement, easier to debug, and works everywhere. Add Streamable HTTP when you need Claude.ai Custom Connector support.

Stateless is non-negotiable for production. No sessions, no server-side state between requests. This gives you zero-downtime deploys (Render just swaps the process), horizontal scaling, and no session cleanup headaches.

3. Tool Taxonomy

After building ~40 tools across 6 servers, clear categories have emerged:

Orientation tools

get_status / get_corpus_status — Called at session start. Returns corpus stats, processing state, and behavioral guidance. This is the most important tool type we’ve built (more on this in Section 4).

Zero parameters
Returns counts, gaps, suggested next actions
Includes the behavioral tip field
Tool description says “Call this at the start of a session”

Search tools

Semantic search (vector embeddings) and keyword search (Meilisearch, SQL ILIKE). The pattern:

Input: query (string), limit (int, default 5), optional filters
Output: ranked results with similarity scores or highlighted snippets

Key lesson: always return enough context to avoid a follow-up call. If a search result includes an ID, also include the filename/title. If it has a description, include a preview. Every round-trip costs latency and context window.

Read tools

Fetch full content of a specific resource:

Input: id (string), optional max_chars/limit
Output: full content + metadata

Important: support progressive disclosure. get_document_text defaults to 5000 chars (preview), with max_chars=0 for full content. This lets Claude scan quickly, then deep-read when needed.

Enrichment tools

Write-back tools that improve the corpus as Claude uses it. The insight that made these work: enrichment should happen during natural use, not as a separate task.

Examples:

log_insight — saves brainstorm notes during creative conversation
add_clause — extracts legal clauses while reviewing a document
set_document_metadata — records structured facts while reading
add_tags — categorises documents during search/review

The ENRICHMENT_HINT pattern attaches nudges to individual read responses when a document lacks metadata. Claude sees “this document has no metadata yet” and proactively enriches it.

Feedback tools

submit_feedback / list_feedback — Let users report bugs, suggest improvements, and ask questions through their conversation with Claude. Append-only from the client side; status management is admin-only via direct SQL.

These are invaluable for product development. Real user feedback, captured in context, with zero friction.

File submission tools

submit_file / upload_status — Accept binary data (base64-encoded) for processing pipelines. The pattern:

Validate format from magic bytes (not filename — base64 has no extension)
Create a tracking record
Spawn background processing (daemon thread)
Return immediately with a tracking ID
Client polls upload_status for progress

Key decisions:

50MB base64 limit (~37MB binary) — reasonable for MCP
Clear rejection messages for unsupported formats, pointing to submit_feedback to request new ones
Background processing avoids blocking the MCP response

Browse/list tools

Paginated listing with optional filters:

Input: optional filters, limit (default 20), offset (default 0)
Output: items array + total count

Always return total alongside paginated results so Claude knows how much more there is.

4. The Self-Bootstrapping Tip Pattern

This is the most impactful UX pattern we’ve found. The idea:

The MCP server itself teaches Claude how to use it well, and suggests persisting that knowledge so the user doesn’t have to keep re-configuring.

How it works

The get_status tool returns a tip field alongside corpus stats:

{
  "table_count": 42,
  "page_count": 12,
  "tip": "This dashboard provides pre-aggregated analytics views.
          For effective exploration: start with list_tables...
          [behavioral guidance] ...If this approach isn't already
          configured in your CLAUDE.md, system memory, or custom
          instructions, consider suggesting to the user (with their
          consent) that they add it so it persists across sessions."
}

Why this works

Zero configuration required. The first time Claude connects, it calls get_status, reads the tip, and immediately knows how to behave. No CLAUDE.md, no system prompts, no manual setup.

Self-propagating. The tip suggests Claude help the user persist the behavior — creating a CLAUDE.md entry or system memory note. Once persisted, the behavior survives across sessions without needing the tip again.

Domain-specific. Each server’s tip is tailored to its tools and domain:

Creative archive: “Proactively log insights, ideas, questions, and decisions throughout the conversation”
Document corpus: “Proactively extract clauses and build metadata as you read documents”
Analytics dashboard: “Start with list_tables, always describe_table before querying, cross-reference narrative with data”

Consent-based. The tip says “consider suggesting to the user (with their consent)” — not “silently configure yourself.” The user stays in control.

Layered behavioral guidance

We’ve ended up with three layers, each operating at a different scope:

Tip (in get_status) → Session-wide behavior. Example: “Proactively log insights across the conversation”
Hint (in read responses) → Per-resource nudge. Example: “This document has no metadata — consider enriching it”
Suggested action (in status) → Task-specific next step. Example: “12 documents need clause extraction”

The tip sets the mindset. The hint triggers action at the right moment. The suggested action prioritises what to work on next.

5. Security

Authentication

Constant-time comparison is mandatory. Use hmac.compare_digest() (Python) or crypto.timingSafeEqual() (Node.js) for token/password checks. Regular === comparison leaks timing information.

Never log credentials. Our _SENSITIVE_FIELDS set strips query, content, text, and data from tool call analytics before writing to the database. The base64 file content from submit_file never touches the logs.

Rate limiting

In-memory per-IP bucket, 60 requests/minute. Simple, effective, zero dependencies:

_rate_buckets: dict[str, list[float]] = {}

def _is_rate_limited() -> bool:
    ip = request.remote_addr
    now = time.monotonic()
    bucket = _rate_buckets.setdefault(ip, [])
    bucket[:] = [t for t in bucket if now - t < 60]
    if len(bucket) >= 60:
        return True
    bucket.append(now)
    return False

For production at scale, you’d want Redis or a distributed rate limiter. For our traffic levels, in-memory is fine and has zero operational overhead.

Input validation

Truncate, don’t reject. If a query exceeds MAX_QUERY_LEN, truncate it silently. If content exceeds MAX_CONTENT_LEN, truncate. Rejecting with an error creates a worse UX than working with slightly truncated input.
Validate enums. Entry types, categories, statuses — check against allowed values and return clear error messages.
Parameterised queries everywhere. Never interpolate user input into SQL. Use %s placeholders (psycopg2) or $1 (pg).
Magic byte detection over filenames. File format detection must use the actual bytes, not the filename extension. Base64-encoded data has no filename guarantee.

Error masking

In production, tool errors are logged server-side but the HTTP response returns a generic -32603 Internal Error. Don’t leak stack traces, database schemas, or internal state to clients.

6. Analytics: Watching How Tools Get Used

Tool call logging

Every tool invocation is logged with:

CREATE TABLE tool_calls (
    id           BIGSERIAL PRIMARY KEY,
    tool_name    TEXT NOT NULL,
    duration_ms  INTEGER,
    success      BOOLEAN NOT NULL DEFAULT true,
    error_message TEXT,
    args_summary JSONB DEFAULT '{}'::jsonb,
    created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

args_summary contains the tool arguments with sensitive fields stripped. This lets you answer: Which tools are used most? Which tools are slow? Which tools fail? What query patterns emerge?

Privacy-preserving argument logging

_SENSITIVE_FIELDS = {"query", "content", "text", "data"}

def _sanitise_args(args: dict) -> dict:
    return {
        k: (f"<{len(str(v))} chars>" if k in _SENSITIVE_FIELDS else v)
        for k, v in args.items()
    }

You see {“query”: “<42 chars>”, “limit”: 5} — enough to understand the call pattern without storing the actual content.

What the data tells you

From our archive server analytics:

search and recall are the most-used tools — the core workflow is search-based
log_insight usage increased after adding the behavioral tip — self-bootstrapping works
submit_feedback captures real user pain points that drive the roadmap

This data directly informs product decisions: which tools to optimise, what new tools to build, where the UX friction lives.

7. Database Patterns

JSONB for flexible schemas

Tags, artwork_ids, suggested_tags — all stored as JSONB arrays. This avoids join tables for many-to-many relationships that are primarily read-heavy:

-- Containment query: find brainstorm entries linked to an artwork
SELECT * FROM brainstorm_log WHERE artwork_ids @> '["uuid-here"]'::jsonb;

-- Tag filtering on search results
SELECT * FROM artworks WHERE tags @> '["landscape"]'::jsonb;

GIN indexes make these fast:

CREATE INDEX ix_brainstorm_artwork_ids ON brainstorm_log USING gin (artwork_ids);

Vector embeddings

Using pgvector with text-embedding-3-large (1536 dimensions). HNSW index for approximate nearest neighbour:

CREATE INDEX ix_artworks_embedding ON artworks
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

The search pattern:

SELECT *, 1 - (embedding <=> %(vec)s) AS similarity
FROM artworks
WHERE embedding IS NOT NULL
ORDER BY embedding <=> %(vec)s
LIMIT %(limit)s;

Migration strategy

Numbered SQL files (001_initial_schema.sql, 002_preview_url.sql, etc.) tracked in a _migrations table. A simple Python script runs them in order:

# scripts/migrate.py
for f in sorted(migration_files):
    if f.name not in already_applied:
        execute(f.read_text())
        record_migration(f.name)

No ORM, no migration framework. The migrations are just SQL. This keeps the operational surface area minimal and the migrations readable.

8. Processing Pipeline Patterns

Background processing with immediate response

For file submission, the MCP tool returns immediately with a tracking ID. Processing happens in a daemon thread:

def _bg_process(upload_id, filename, file_bytes, tags):
    try:
        process_file(upload_id, filename, file_bytes, suggested_tags=tags)
    except Exception as e:
        logger.exception(f"Background processing failed: {e}")

thread = threading.Thread(target=_bg_process, args=(...), daemon=True)
thread.start()

return {"upload_id": upload_id, "status": "processing"}

Claude can then poll upload_status to check progress. This keeps the MCP response fast (<1s) even when processing takes minutes.

For our scale (single user, occasional uploads), daemon threads are fine. At higher volume, you’d want a proper job queue (Celery, RQ, or even just a Postgres-backed queue).

Format detection from magic bytes

Never trust filenames. Detect format from the actual bytes:

if data[:2] == b"\xff\xd8":           return "jpeg"
if data[:4] == b"\x89PNG":            return "png"
if data[:4] in (b"II\x2a\x00", ...): return "tiff"
if data[4:8] == b"ftyp":             return "heic"
if data[:4] == b"%PDF":              return "pdf"

This is essential when receiving base64-encoded data through MCP — there’s no filename extension to rely on.

Conversion pipeline

Convert everything to Vision-ready formats (PNG/JPEG) before processing:

TIFF → PNG (via PyMuPDF)
HEIC → JPEG (via Pillow + pillow-heif)
PDF → PNG pages (via PyMuPDF, 200 DPI)
JPEG/PNG → pass through

Dependencies are minimal: PyMuPDF handles both PDF and TIFF. Pillow + pillow-heif handle HEIC. No ImageMagick, no system-level dependencies.

9. Deployment

Render as the platform

All servers deploy on Render with:

Web service: gunicorn (Python) or node (JS)
PostgreSQL: Managed database per project
Build command: runs migrations automatically
Health check: /healthz endpoint
Zero-downtime deploys: Render swaps processes seamlessly

The stateless HTTP design means deploys are invisible to users. MCP connections are request-response — there’s no persistent connection to drop.

Environment configuration

Pydantic BaseSettings (Python) or process.env (Node.js). All secrets in Render’s environment dashboard. .env for local development. .env.example committed to git as documentation.

The key insight: make as much optional as possible. Vision, embeddings, R2 storage — all gracefully degrade if their API keys aren’t set. This means you can run the server locally with just a database, and features light up as you add keys.

10. Unsupported Format Messaging

When a user submits an unsupported file type, the error message is part of the product:

Unsupported format. Currently supported: JPEG, PNG, TIFF, HEIC, PDF.

We're adding more formats based on user needs — use submit_feedback
to request specific file types and we'll add them to the roadmap.

This does three things:

Tells them what IS supported (so they can convert)
Tells them the limitation is temporary (reduces frustration)
Gives them an action (submit feedback) that feeds the roadmap

The feedback becomes product development data. When three users request PSD support, you know to prioritise it.

11. Cross-Server Patterns to Standardise

After building 6 servers independently, these are the patterns worth extracting:

Every server should have

get_status — Orientation tool with stats and behavioral tip
submit_feedback — User feedback channel (append-only)
Tool call analytics — Logged with sanitised args
Rate limiting — Per-IP, in-memory is fine for small scale
Health check endpoint — /healthz for deployment platforms

Consistent tool response format

Success:

{
  "content": [
    {"type": "text", "text": "..."}
  ]
}

Error:

{
  "content": [
    {"type": "text", "text": "Error: ..."}
  ],
  "isError": true
}

Consistent naming

get_* — Read a specific resource
list_* — Browse/paginate resources
search_* — Query resources
set_* / add_* — Write/enrich
submit_* — User-initiated actions (feedback, files)
get_status — Always this name, always zero parameters

12. Industry Comparison (March 2026)

How do these patterns hold up against the broader MCP ecosystem?

Ecosystem context

MCP moved to the Agentic AI Foundation under the Linux Foundation in December 2025. It’s no longer just an Anthropic spec — it’s becoming an industry standard. Stripe, Cloudflare, Sentry, Plaid, and others now ship official MCP servers. The spec has its own governance and is evolving fast.

Pattern-by-pattern comparison

Token-in-URL auth — OAuth 2.1 is the official spec direction, but token-in-URL and API keys remain the dominant pattern in practice for single-user/small-team servers. Assessment: Pragmatic. Fine for private servers. Move to OAuth if distributing publicly.

Stateless HTTP — The June 2026 spec revision is heading exactly here — deprecating SSE in favour of stateless Streamable HTTP. Assessment: Ahead of curve. We designed for this before the spec caught up.

Self-bootstrapping tip — Blockscout (blockchain explorer) independently developed a similar “orientation tool” pattern. Zologic coined the term “behavioral commerce prompting” for embedding behavioral guidance in tool responses. Assessment: Ahead of curve. Few servers do this; none have our three-layer approach (tip → hint → suggested action).

Progressive disclosure — Standard across well-designed servers. Anthropic’s own docs recommend search → detail patterns. Assessment: Well-aligned. This is table stakes for good MCP design.

Tool call analytics — Most published servers have no observability. A few enterprise tools (Cloudflare, Datadog) have built-in logging, but it’s not common in the ecosystem. Assessment: Ahead of curve. Especially the privacy-preserving argument sanitisation.

Enrichment during use — Unique to our portfolio. Most servers are read-only or have write tools that wait for explicit instructions. Assessment: Novel. The ENRICHMENT_HINT nudge pattern has no direct parallel we’ve found.

Feedback tools — Uncommon. Most servers treat Claude as a read-only consumer. Few have built-in channels for user-to-developer communication. Assessment: Ahead of curve. Turns the AI conversation into a product feedback loop.

Rate limiting — Standard practice but inconsistently applied. Many published servers skip it entirely. Assessment: Well-aligned.

One-line setup — Increasingly common as Custom Connectors mature. The best public servers optimise for this. Assessment: Well-aligned.

File submission pipeline — Rare. Most MCP servers don’t accept binary input. The base64 + background processing + polling pattern is unusual. Assessment: Novel. Few servers handle file upload workflows at all.

Patterns we haven’t adopted yet

Several patterns have emerged in the ecosystem that we don’t currently use:

Tool annotations — The MCP spec now supports metadata on each tool: readOnlyHint, destructiveHint, idempotentHint, openWorldHint. These help Claude make safer decisions about which tools to call and in what order. Low effort to add and immediately useful — Claude can know that query is read-only but delete_clause is destructive without parsing the description.

Server instructions field — A top-level string in the server capability declaration, delivered at connection time before any tool is called. Think of it as a CLAUDE.md that ships with the server. This could complement or partially replace our get_status tip — the behavioral guidance would arrive even if Claude doesn’t call get_status first. Worth investigating as a belt-and-suspenders approach alongside the tip.

Structured content (dual-format responses) — Returning both machine-readable data and human-readable formatted text in the same response. We return JSON that Claude parses; we could also include a Markdown-formatted summary for direct display. The Anthropic docs specifically recommend this for responses that might be shown to users.

MCP Server Cards (.well-known/mcp.json) — A discovery and metadata mechanism. Declares what the server does, what auth it requires, what tools it offers. Not critical for private servers, but useful if we ever want to list in a public registry.

Elicitation — A new spec feature where the server can ask the user a question mid-tool-call (e.g., “Which branch do you want to query?”). We haven’t needed this because our tools are designed to be self-contained, but it could be useful for disambiguation in complex workflows.

13. Future Directions

Based on what we’ve learned and where the ecosystem is heading:

Near-term (low effort, high value)

Add tool annotations to all servers. Mark read-only tools as readOnlyHint: true, destructive tools as destructiveHint: true. A few lines per tool definition, immediate safety benefit.
Add instructions field to server capability declarations. Move the core behavioral guidance from the get_status tip into the server instructions (delivered at connection time), while keeping the tip for dynamic stats and session-specific context. This ensures Claude gets behavioral guidance even if it skips get_status.
Roll out submit_feedback to remaining servers. Some dashboard servers already have it; the others don’t. Every server should have a feedback channel.
Roll out tool call analytics to the Node.js servers. Currently only our Python servers log tool calls with sanitised arguments. The Node.js dashboard servers would benefit from the same visibility.

Medium-term (moderate effort)

Structured content responses. For tools whose output might be displayed directly to users (artwork descriptions, feedback summaries, page content), return both JSON and a formatted Markdown version.
Extract shared MCP utilities. The status-tools.js module is already shared across three clarity servers (by copy). As the portfolio grows, a shared npm package or git submodule for common patterns (status tools, feedback tools, rate limiting, analytics) would reduce drift.
Automated testing for MCP tools. We do manual verification after changes. A lightweight test harness — mock pool, call each handler, assert response shape — would catch regressions early.

Longer-term (ecosystem dependent)

OAuth 2.1 if we distribute servers to external users.
MCP Server Cards if a public registry becomes useful.
Elicitation if we find workflows where mid-call user input improves UX.

Summary

The most important lessons, in order of impact:

One-line connection setup. Token-in-URL. No multi-step configuration.
Self-bootstrapping behavioral tips. The server teaches Claude how to use it well.
Enrichment during natural use. Don’t make users do separate enrichment passes.
Analytics from day one. You can’t improve what you don’t measure.
Progressive disclosure. Preview by default, full content on demand.
Stateless HTTP. Zero-downtime deploys, no session management.
Clear unsupported-format messages. Turn limitations into feedback loops.

And the biggest strategic insight: MCP installation is now copy-paste for ordinary users. The barrier between building a tool and distributing it has collapsed to a single URL. The patterns above are how to design for that world — where the user is Claude, the UX is a tool schema, and the onboarding is a JSON response.

More on building MCP servers:

Building MCP Servers: Patterns and Lessons — the full technical stack from auth to deployment
The MCP Legibility Layer — making your server’s data visible
MCP Servers Are the New Websites — why this matters for businesses
My AI Stack — MCP as part of a broader stack

See this in practice: PT-Edge is a production MCP server that implements every pattern discussed above — 47 tools, 300+ tracked projects, real users. Connect in 30 seconds:
https://mcp.phasetransitions.ai/mcp?token=vUu6WrGR1lBprbL8esgiMLGphM5onrcn
Source code: https://github.com/grahamrowe82/pt-edge

Discussion about this post

Ready for more?