title: Failure-mode tools description: Three controllable-failure tools: error returns specified error categories, slow injects N milliseconds of latency, flaky fails P percent of calls reproducibly from a seed.

Failure-mode tools¶

Three tools that produce controlled failure modes (errors, latency, probabilistic flakiness) so a gateway can be exercised against well-defined adversarial inputs.

error¶

Returns an error with a caller-specified message and category.

Arguments:

Field	Type	Notes
`message`	string	Optional. Defaults to `"synthetic error"`.
`category`	string	Optional. One of `protocol`, `tool`, `timeout`, `auth`. Recorded in the audit row's `error_category` column for filtering.
`as_tool`	bool	Optional. If `true`, returns a `CallToolResult` with `IsError=true`; otherwise raises a JSON-RPC protocol error.

Returns (when as_tool=true):

{ "content": [{ "type": "text", "text": "synthetic error" }], "isError": true }

(when as_tool=false): a JSON-RPC error response, no body.

What it tests:

Error propagation. Tool-level errors (IsError=true) and protocol-level errors are different beasts. Verify your gateway preserves the distinction. Tool-level errors should reach the client as a successful tool call with isError set; protocol errors should surface as JSON-RPC errors.
Audit categorization. The category argument lets you tag rows so audit-log filters work. Useful for checking that the gateway forwarded the error code rather than masking it.

slow¶

Sleeps for the specified milliseconds, then returns. Honors ctx.Done().

Arguments:

Field	Type	Notes
`milliseconds`	int	Required. Capped at 60000 (60 seconds).

Returns:

{ "slept_ms": 1500 }

If the context is cancelled mid-sleep, the tool returns ctx.Err() and the audit row records the partial duration.

What it tests:

Timeout policy. Set milliseconds past your gateway's deadline. The gateway should cancel the call cleanly, mcp-test should see the cancellation, and the audit row should reflect a partial duration.
Latency budgets. Run a series of slow calls at varying delays and verify the gateway's p95/p99 latency reflects them.
Concurrency. Many concurrent slow calls expose the gateway's ability to multiplex across the streamable HTTP transport without serializing.

flaky¶

Returns success or a synthetic failure based on the supplied probability and a seed.

Arguments:

Field	Type	Notes
`fail_rate`	float	Required. 0 to 1. Clamped if outside.
`seed`	string	Optional. Combined with `call_id` for reproducibility.
`call_id`	int	Optional. Caller-supplied iteration index.

Returns:

{ "failed": false, "roll": 0.7234, "fail_rate": 0.5 }

Same (seed, call_id) always produces the same roll, and therefore the same outcome (failed or not). Different call_ids with the same seed give different outcomes — useful for simulating a sequence of calls where the failure pattern is predictable.

When the call fails, the JSON-RPC envelope is an error with a message like flaky failure (roll=0.4231 < rate=0.5000).

What it tests:

Retry policy. Set fail_rate=0.5, vary call_id from 0 to N, and verify the gateway retries failures up to its configured limit. With a fixed seed, you know exactly which call_ids will fail.
Backoff timing. Audit rows give you precise inter-call intervals when retries are happening.
Idempotency. A retried flaky call with the same seed + call_id deterministically succeeds or fails on retry. This lets you test whether the gateway re-runs the upstream or just replays a cached response.

Determinism guarantee¶

flaky outputs are byte-stable for fixed (seed, call_id, fail_rate) inputs across process restarts, OS, and arch. The PRNG is math/rand/v2's PCG seeded from FNV-1a hashes of the seed string.