feat(examples): add simple-sampling server and client#2476
Open
trentisiete wants to merge 1 commit intomodelcontextprotocol:mainfrom
Open
feat(examples): add simple-sampling server and client#2476trentisiete wants to merge 1 commit intomodelcontextprotocol:mainfrom
trentisiete wants to merge 1 commit intomodelcontextprotocol:mainfrom
Conversation
Addresses modelcontextprotocol#1205 by adding two workspace members that demonstrate an end-to-end sampling handshake with a real LLM. examples/servers/simple-sampling exposes a write_story tool whose handler issues sampling/createMessage with every advisory field set (modelPreferences hints + priorities, systemPrompt, temperature, maxTokens, stopSequences, includeContext, metadata). examples/clients/simple-sampling-client wires a sampling_callback onto ClientSession. It maps SamplingMessage into an OpenAI-style chat payload, treats the first ModelHint as a soft override, logs the numeric priorities and the includeContext hook so multi-server clients can see where to inject context, and surfaces provider failures as ErrorData rather than raising. The client speaks the OpenAI-compatible /chat/completions schema via httpx, so it runs against OpenAI, Groq, OpenRouter, Ollama, vLLM, or any other gateway that honours the contract. Provider is picked via LLM_API_KEY / LLM_API_BASE_URL / LLM_MODEL env vars to avoid pinning a provider-specific SDK.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an end-to-end sampling example to
examples/, addressing #1205.Two new workspace members:
examples/servers/simple-sampling/— minimal server with one tool (write_story) that callssession.create_message(...)populating every advisory field ofCreateMessageRequestParams: model preferences with hints + cost/speed/intelligence priorities, system prompt, temperature, stop sequences, include_context, metadata.examples/clients/simple-sampling-client/— wiressampling_callbackto a real LLM via any OpenAI-compatible endpoint. Default is Groq (free tier). Switching to OpenAI / OpenRouter / Ollama (/v1) / vLLM is 1-3 env vars; no extra dependencies.Context
There's already #1436 from @yarnabrina addressing this issue. Their PR has been waiting on review for months and they've explicitly given the green light for an independent PR (see the recent comment thread on #1205). This one tries to land the same goal while incorporating the feedback their PR received and never got a chance to resolve:
openaiSDK; useshttpxagainst the OpenAI-compatible/chat/completionsschema.LLMClientabstraction matching thesimple-chatbotpattern that was requested in their review — the sampling callback knows nothing about httpx, and the LLM wrapper knows nothing about MCP.Design notes
ModelHint.nameis treated as a soft override, falling back toLLM_MODEL. The numeric priorities (cost/speed/intelligence) get logged but not used for routing — picking a model from those would require a provider-specific catalog, which is out of scope for an example.types.ErrorData(code=INTERNAL_ERROR, ...)rather than raising, so the server gets a readable error instead of a transport-level one.[<type> content omitted]placeholders rather than being silently dropped. A production client would forward or explicitly reject them.Verification
End-to-end check against a local
ThreadingHTTPServerstub of/chat/completionsthat captures the request body and returns a canned story. Asserts that:llama-3.1-8b).max_tokens=200,temperature=0.8,stop=["THE END"], the system prompt, the user message and the metadata round-trip correctly.The verification harness itself is not part of the PR. Repo checks pass:
uv run ruff format --check: cleanuv run ruff check: 0 issuesuv run pyright: 0 errors, 0 warnings, 0 informationsTry it
Full env var table in
examples/clients/simple-sampling-client/README.md.