MCP, in production

mcp tool descriptions are agent invocation protocols, not blurbs

Most articles about this topic frame the description string on an MCP tool as documentation: write a clear sentence, list the parameters, give an example. That is fine if your tool stands alone.

On a shipping MCP server, where four tools have to run in a specific order or the user gets a message sent to the wrong contact, the description string is doing real work. It names the next tool to call. It names the verification step. It capitalises the prohibition. I will show you all eleven of them, line by line, in the Swift source of whatsapp-mcp-macos.

Skip to the protocol-in-one-description View the Swift source →

Matthew Diakonov, Written with AI

Published April 24, 20269 min read

4.9from 11 tool descriptions, all quoted verbatim

Every quote points to a line in Sources/WhatsAppMCP/main.swift

Verified on whatsapp-mcp-macos v3.0.0

Covers the 4-tool send-message protocol an agent must follow

Tool descriptions are micro-prompts.

Not documentation. Not blurbs.

Each one names the next tool by exact name.

Each one states the precondition the model must verify.

Each one capitalises the prohibition: NOT, CURRENTLY OPEN.

One description holds an ordered four-tool chain.

The description is the contract, the handler is the executor.

0:00 / 0:05

The premise, on one screen

When a host such as Claude Code or Cursor connects to an MCP server, it calls tools/list once and receives an array of tool objects: name, description, input schema. It injects the entire list into the model's system context for the rest of the session. Every time the model decides what to invoke next, it is reading those description strings.

That makes a description a piece of prompt engineering, not documentation. The cost of treating it as documentation shows up the first time an agent calls whatsapp_send_message with the wrong chat in focus.

what the model receives at session start

What a generic description gets wrong

Side by side: the description an MCP guide would suggest, and the description that actually ships in the Swift source. The right column is 27 words longer, and those 27 words are the difference between a tool the model calls correctly and a tool the model misuses.

whatsapp_send_message, two ways

description: "Send a WhatsApp message to a contact."

0% words of agent guidance

The right-hand description encodes three things the left-hand one cannot: a state assertion (CURRENTLY OPEN), an explicit prohibition (Does NOT search or navigate), and a named ordered chain of three sibling tools the model must run first.

How a description steers the model between tools

The four state-changing tools form a pipeline: read the model's intent, search, open, verify, send. Each description is written so the model arrives at the next tool with the right preconditions already met. The diagram below maps the cross-references.

cross-references inside the eleven description strings

The anchor fact: a 4-tool protocol embedded in one description

Of all eleven descriptions in the file, the one on whatsapp_send_message is the one that does the most work. It is 31 words. It names three other tools by exact, callable name. It includes one capitalised state assertion and one capitalised prohibition. Read it as prose and it sounds like an instruction to a junior engineer; read it from inside the model's context window and it is a finite state machine.

description string, verbatim

“Send a message in the CURRENTLY OPEN chat. Does NOT search or navigate, it only types and sends. Use whatsapp_search + whatsapp_open_chat + whatsapp_get_active_chat to verify the right chat first.”

Sources/WhatsAppMCP/main.swift, line 1084

Sources/WhatsAppMCP/main.swift

The protocol the description encodes, step by step

Each step is a tool, and each tool's description points at the next. Together they form a verify-before-send chain that the model assembles by reading description strings, not by following a workflow file we wrote separately.

1. whatsapp_search(query)

Description ends with 'Leaves search OPEN, call whatsapp_open_chat(index) to select a result.' The search box stays open on purpose so the next tool can click into the result list. The description tells the model both the side effect (search remains open) and the next tool (whatsapp_open_chat).

2. whatsapp_open_chat(index)

Description: 'Click the Nth search result to open that chat. Call whatsapp_search first, then use the index from the results. Returns the name of the chat that was actually opened, verify this matches your intended contact.' Two pieces of agent guidance in one string: precondition (search must have run) and postcondition (verify the returned name).

3. whatsapp_get_active_chat()

Description: 'Returns the name of the currently open/active WhatsApp chat. Use this to verify which chat is open before sending a message.' This tool exists primarily as a verification step. Its description points at the very next tool the model will call.

4. whatsapp_send_message(message)

Description: 'Send a message in the CURRENTLY OPEN chat. Does NOT search or navigate, it only types and sends. Use whatsapp_search + whatsapp_open_chat + whatsapp_get_active_chat to verify the right chat first.' One description names the entire prefix protocol the model must run before invoking it.

The two other description strings that do similar work

whatsapp_open_chat and whatsapp_search both name a sibling tool. The pattern is consistent: state what the tool does, name the previous tool that has to have run, and (where applicable) name the next tool the model should call after interpreting the return value.

whatsapp_open_chat (lines 1048-1057)

whatsapp_search (lines 1032-1046)

The send-message handshake, drawn out

Here is the same chain as a handshake between the model and the server. The arrows on the right are the description-string cross-references; the arrows on the left are the actual JSON-RPC calls. The model is the one walking the protocol, but it is the descriptions that tell it where to step.

invoking whatsapp_send_message safely

Six rhetorical patterns inside the eleven description strings

Once you know what to look for, the same six moves show up in every description that does real agent steering. Each one is cheap to add and cheap to read.

patterns that survive into the model's context

ALL-CAPS state assertions: 'CURRENTLY OPEN', 'OPEN'. The casing survives tokenisation and reads as emphasis.
Capitalised prohibitions: 'Does NOT search or navigate', 'do NOT attempt to use WhatsApp Web'.
Named-tool chains: 'Use whatsapp_search + whatsapp_open_chat + whatsapp_get_active_chat'.
Imperative ordering verbs: 'Call X first, then use the index'.
Verification clauses tied to return values: 'Returns the name of the chat that was actually opened, verify this matches your intended contact'.
Recovery hints scoped to one tool: 'If the contact you want isn't visible, use whatsapp_scroll_search to load more results'.

How to apply this to your own MCP

If you build an MCP server with more than one state-changing tool, there is a checklist that takes about ten minutes per tool to apply.

description-as-protocol checklist

Name a sibling tool by exact, callable name (whatsapp_search, not 'the search tool').
State the precondition tools that MUST have run already (verify-before-send).
State the postcondition: what the return value tells the model to do next.
Capitalise the one or two facts the model must NOT misread (CURRENTLY OPEN, NOT, OPEN).
Refuse jobs the tool cannot do, in the description, with the word NOT.
Include the recovery tool for the most likely failure (scroll if not visible).

The whatsapp-mcp-macos source applies this checklist to 0 of its 11 tools (the four state-changing ones). The other seven are read-only and get a verb in a sentence, which is fine.

CURRENTLY OPENDoes NOT search or navigateCall whatsapp_search firstverify this matches your intended contactLeaves search OPENUse whatsapp_search + whatsapp_open_chat + whatsapp_get_active_chatdo NOT attempt to use WhatsApp Webverify the right chat first

Every phrase above is a literal substring of a description string in Sources/WhatsAppMCP/main.swift.

Read it yourself

The description strings live in one file, between two line numbers. Open them, read all eleven, then read the matching handlers above them, and the protocol-in-prose pattern becomes obvious. The shortest description is on whatsapp_quit (4 words). The longest is on whatsapp_send_message (31 words). The variation is intentional: tools with state-changing side effects get protocol prose, tools without them get a verb.

File: Sources/WhatsAppMCP/main.swift
Range: lines 993–1108
Server registration: line 1110 (the allTools array)
Handlers: registered in the withMethodHandler(CallTool.self) switch at line 1151

Open the description block on GitHub →

Want descriptions that drive agents this precisely on your own MCP server?

Bring the tool list and a sample agent transcript. We rewrite the description strings together and watch the agent stop misfiring.

Frequently asked questions

Why is whatsapp_send_message's description longer than the function it wraps?

Because the dangerous part of sending a message isn't the typing, it's sending it to the wrong chat. The Swift handler `handleSendMessage` is a few lines of accessibility calls. The description is 31 words because those words are the contract that prevents an agent from invoking the tool with the wrong contact in focus. The description names the three sibling tools that must have run first: whatsapp_search, whatsapp_open_chat, and whatsapp_get_active_chat. Without that prose, an agent that has just listed chats and seen 'Mom' on screen would reasonably conclude that calling whatsapp_send_message would send to Mom. It would be wrong, because the active chat is whichever one the user last clicked, not whichever one the model last saw mentioned. The description encodes that invariant.

Why does whatsapp_get_active_chat exist as a separate tool when an agent could 'just remember' which chat it opened?

Because in practice, the agent doesn't 'just remember'. Between whatsapp_open_chat and whatsapp_send_message, several other tool calls might happen: a list_chats refresh, a notification dialog the user clicked through, a scroll. The native WhatsApp Catalyst app's active-chat state is the source of truth, not the model's working memory. whatsapp_get_active_chat returns that ground-truth value, and its description ('Use this to verify which chat is open before sending a message') tells the model when to call it. It's a tool that exists to be a checkpoint in the protocol, and its description is what makes it act like one.

Couldn't the same outcome be achieved with a single 'send message to contact' tool?

It could, but the Swift implementation chose the opposite trade-off, and the description is where you can read the reasoning. The composed tool would have to swallow search ambiguity, scroll behaviour, and verification inside one call, which means errors come back as 'tool failed' with no way for the model to recover gracefully. Splitting it into four narrow tools, each with a description that names the next, lets the model reason between steps. If whatsapp_open_chat returns 'Mom' instead of 'Mum (work)', the model can call whatsapp_search again with a refined query. The description on whatsapp_open_chat anticipates this exact failure: 'verify this matches your intended contact'.

Is naming sibling tools by exact name in a description a documented MCP convention?

It is not in the MCP specification. The spec defines that a tool MUST have a description string and that the string is shown to the model, but it doesn't prescribe what to put in it. The MCP servers that work well in practice tend to name siblings by exact name, because hosts such as Claude Code expose the full list to the model on every turn, and the model is much more likely to pick a tool whose description was just referenced by the previous tool's description. The cost is that renaming a tool now means searching every other description for the old name. The whatsapp-mcp-macos source has 'whatsapp_search' hardcoded in three other tool descriptions, so renaming it would silently break the protocol if any reference is missed.

How does an MCP host actually deliver these descriptions to the model?

The host calls tools/list once per session over the stdio transport. The server returns an array of tool objects, each with name, description, and inputSchema. The host then injects all of that into the model's system prompt for the duration of the session. So when whatsapp_send_message's description says 'Use whatsapp_search + whatsapp_open_chat + whatsapp_get_active_chat to verify the right chat first', the model literally sees those names alongside the tools they refer to. The cross-references resolve because every name in the prose also appears as a callable tool in the same list. If you rename whatsapp_search to whatsapp_find_contact and don't update the other descriptions, the model still tries to call whatsapp_search and the host returns 'tool not found'.

What happens if I write tool descriptions that don't name siblings or forbid actions?

The model improvises. With a description like 'Send a WhatsApp message to a contact' and no sibling reference, an agent that has just read messages from the active chat will call whatsapp_send_message with whichever contact name appeared most recently in conversation. With WhatsApp specifically, that's how messages get sent to the wrong person, because the model conflates 'the chat I'm looking at' with 'the chat I most recently mentioned by name'. The protocol-style description ('CURRENTLY OPEN', 'Does NOT search', 'verify first') closes that gap by giving the model an explicit anti-pattern to avoid. The cost is verbosity, the benefit is fewer wrong-chat sends.

Why use ALL CAPS instead of bold or quotes inside a description string?

Because the description is delivered as a JSON string into the model's context. Markdown bold and quotes survive, but ALL CAPS is the most compact and unambiguous emphasis that survives any tokeniser and any host that re-renders the description. In whatsapp-mcp-macos's source, you can see this used sparingly: 'CURRENTLY OPEN' once, 'NOT' twice, 'OPEN' once. Used for every word, capitalisation stops working as a signal. Used for the one or two facts the model must not misread, it's the cheapest possible attention mechanism.

Can I see the actual description strings the WhatsApp MCP ships with?

Yes. They are literal Swift string literals at lines 993 to 1108 of Sources/WhatsAppMCP/main.swift in the whatsapp-mcp-macos repo. The file is a single MCP server entrypoint; everything after `func setupAndStartServer()` declares Tool values whose `description:` argument is the string the model sees. There are 11 tools and 11 description strings. The longest is whatsapp_send_message at 31 words. The shortest is whatsapp_quit at 4 words ('Quit/close WhatsApp.'). The variation is intentional: tools with state-changing side effects get protocol prose, tools without them get a verb.