Skip to content

Add typed tool composition with context.tools.execute()#796

Draft
evantahler wants to merge 11 commits intomainfrom
evantahler/tool-composition
Draft

Add typed tool composition with context.tools.execute()#796
evantahler wants to merge 11 commits intomainfrom
evantahler/tool-composition

Conversation

@evantahler
Copy link
Copy Markdown
Contributor

@evantahler evantahler commented Mar 17, 2026

Summary

Feature Announcement Slides

Adds typed tool composition to arcade-mcp-server, allowing compound tools to call other tools with type safety and automatic response structuring. Also adds cross-tool requirement resolution so compound tools can declare upfront what auth/secrets their sub-tools need.

Tool Composition (context.tools.execute())

  • New context.tools.execute(ResponseModel, tool_name, args) method for strongly-typed cross-tool calls
  • 3-tier structuring strategy: direct Pydantic validation → heuristic mapping → LLM extraction (via MCP sampling or Anthropic SDK fallback)
  • OnMissing.ALLOW_NULL for resilient field extraction when upstream responses change
  • Automatic fallback to Arcade Cloud for tools not registered locally

Cross-Tool Requirement Resolution (requires_secrets_from / request_scopes_from)

  • @tool(requires_secrets_from=["Gmail.ListEmails"], request_scopes_from=["Slack.SendMessage"])
  • Fetches remote tool definitions from Arcade Cloud at startup via arcade.tools.get()
  • Merges secrets and OAuth scopes into the compound tool's requirements
  • MCP clients see the full requirements at tools/list time and can prompt for auth proactively
  • No Arcade Cloud changes required

Multi-Provider Auth

  • Compound tools referencing different OAuth providers (e.g. Google + Slack) now track all providers
  • resolved_authorizations field stores the full list; exposed via _meta.arcade.requirements.authorizations
  • Each provider is checked at execution time before the tool runs
  • Single-provider tools work exactly as before (backward compatible)

Key Files

  • libs/arcade-mcp-server/arcade_mcp_server/context.pyTools.execute(), structuring, remote fallback
  • libs/arcade-mcp-server/arcade_mcp_server/server.py_resolve_cross_tool_requirements(), multi-provider auth checks
  • libs/arcade-core/arcade_core/structuring.py — Tier 1-2 deterministic structuring
  • libs/arcade-mcp-server/arcade_mcp_server/convert.pyauthorizations list in MCP metadata
  • examples/mcp_servers/email_to_slack.py — Full compound tool example

Test plan

  • uv run pytest libs/tests/arcade_mcp_server/test_cross_tool_requirements.py — 19 tests for requirement resolution + multi-provider auth
  • uv run pytest libs/tests/arcade_mcp_server/test_tool_composition_structuring.py — LLM extraction tests
  • uv run pytest libs/tests/core/test_structuring.py — Tier 1-2 structuring tests
  • uv run pytest libs/tests/ — Full test suite (2365 pass)
  • make check — ruff + ruff-format + mypy clean (no new errors)
  • Manual: run email_to_slack example, verify tools/list includes merged Gmail/Slack auth in _meta.arcade.requirements.authorizations

🤖 Generated with Claude Code

…ng layer

Introduces the ability for compound tools to call other tools (local or remote
via Arcade Cloud) with strongly-typed Pydantic response models. Adds a tiered
structuring strategy (direct validation, heuristic mapping, LLM sampling) and
an email-to-Slack example demonstrating the full flow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@evantahler evantahler marked this pull request as draft March 17, 2026 19:32
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov

This comment was marked as outdated.

evantahler and others added 9 commits March 17, 2026 13:16
These params pre-merged auth/secret requirements from referenced tools at
server startup. Since we bubble up auth errors as they occur rather than
pre-authorizing with Arcade Cloud, this mechanism is unnecessary complexity.

Removes:
- ToolDefinition.requires_secrets_from / request_scopes_from fields
- ToolCatalog.resolve_cross_tool_requirements() method
- Parameters from @tool(), MCPApp.add_tool(), MCPApp.tool() decorators
- resolve_cross_tool_requirements() call in MCPApp.run()
- test_catalog_resolution.py (tests exclusively for removed feature)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
These log lines are informational/diagnostic, not warnings — reduces
noise in production logs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion

When an MCP client doesn't support sampling, Tier 3 LLM extraction was
completely unavailable. This adds a second path (Tier 3b) using the
Anthropic SDK with forced tool_use for reliable structured JSON output.

- Add _SamplingUnavailableError sentinel to distinguish "client lacks
  capability" from other sampling failures
- Modify _extract_via_sampling to raise the sentinel instead of
  ToolResponseExtractionError for capability-absent cases
- Add _extract_via_anthropic using tool_use + tool_choice for structured
  extraction; lazy-inits and caches the AsyncAnthropic client
- Update execute() to chain Tier 3a → 3b when sampling unavailable
- Add AnthropicSettings (ANTHROPIC_API_KEY, ANTHROPIC_MODEL, ANTHROPIC_BASE_URL)
- Add anthropic>=0.40.0 dependency
- Add 13 tests covering both paths and orchestration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Clients that report sampling capability but don't implement it return a
JSON-RPC -32601 error, which was raised as RequestError rather than
ValueError. This bypassed the _SamplingUnavailableError sentinel and
fell into the generic except block, skipping the Anthropic SDK fallback.

Now also catches RequestError and checks for -32601/Method not found
strings to correctly route to Tier 3b.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The structuring layer with OnMissing.ALLOW_NULL handles missing fields,
so the Pydantic models should declare required fields to better demonstrate
the contract-based approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
asyncio.timeout was added in Python 3.11. Use async_timeout as a fallback
on 3.10, with a no-op context manager if async_timeout is not installed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…vider auth

Restores requires_secrets_from and request_scopes_from on the @tool decorator,
but instead of resolving from the local catalog (removed in 2535f0f), fetches
remote tool definitions from Arcade Cloud at server startup via arcade.tools.get().

This lets compound tools like email_to_slack declare upfront that they need the
same auth/secrets as the remote tools they call, so MCP clients can prompt for
authorization proactively rather than failing mid-execution.

Also adds multi-provider auth support: when a compound tool references tools with
different OAuth providers (e.g. Google + Slack), all providers are tracked in
resolved_authorizations and exposed to MCP clients via _meta.arcade.requirements.authorizations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e cases

Covers context.py tool composition paths (call_raw, _call_remote,
_handle_remote_auth, execute fallbacks, elicitation validation, helpers)
and server.py cross-tool resolution edge cases (null oauth2, null scopes,
missing auth/requirements on remote tools).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@JiwaniZakir JiwaniZakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example in email_to_slack.py uses logger.warning() for routine informational messages ("Fetching emails with n_emails=%d" and "Gmail response: %s") — these should be logger.info() or logger.debug() since they represent normal execution flow, not degraded conditions. Using warning here will pollute logs in production deployments and undermine the signal value of that log level.

The Slack send loop in forward_emails_to_slack issues calls sequentially; for max_emails=5 this means up to 5 serial round-trips to Slack. Since each context.tools.execute() is async, these could be fanned out with asyncio.gather(), which would be meaningfully faster and is a more idiomatic pattern for this kind of fan-out composition.

ToolResponseExtractionError in errors.py carries no structured metadata about what failed — e.g., which field couldn't be mapped, what the raw response looked like, or which target model was being structured into. Adding optional fields like field_path and raw_value to the exception (or at least surfacing them in developer_message) would make these errors far easier to diagnose when OnMissing.ALLOW_NULL is not set and extraction actually fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants