Conversation
8b48c77 to
d02dcde
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new WebDriver BiDi-based browser backend to pilo-core, plus a foxcloud session wrapper and CLI wiring, enabling non-Playwright browser control over WebSocket.
Changes:
- Introduces
BiDiConnection(minimal BiDi WS client) andBiDiBrowser(newAriaBrowserbackend using JS evaluation). - Adds
FoxcloudBrowserwrapper for broker-backed session lifecycle (create/poll/delete, park/resume). - Updates CLI/config/docs and adds unit + smoke tests for the new backends.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| pnpm-lock.yaml | Locks new deps for WS + BiDi protocol support. |
| packages/core/package.json | Adds ws (+ types) and webdriver-bidi-protocol dev dependency. |
| packages/core/src/browser/bidiConnection.ts | Implements WebSocket transport with command correlation + timeouts + events. |
| packages/core/src/browser/bidiBrowser.ts | Implements AriaBrowser via BiDi commands + JS evaluation utilities. |
| packages/core/src/browser/foxcloudBrowser.ts | Adds broker session management on top of BiDiBrowser. |
| packages/core/src/index.ts | Re-exports BiDiBrowser/FoxcloudBrowser from package entrypoint. |
| packages/core/src/config/defaults.ts | Extends supported browsers list to include bidi and foxcloud. |
| packages/core/test/bidiConnection.test.ts | Unit tests for BiDiConnection (connect, timeouts, events, pending rejection). |
| packages/core/test/bidiBrowser.test.ts | Unit tests for BiDiBrowser navigation, eval-based APIs, screenshots, temp tabs. |
| packages/core/test/foxcloudBrowser.test.ts | Unit tests for foxcloud session lifecycle and BiDi connection URL wiring. |
| packages/core/test/smoke-bidi.ts | Manual smoke script against a real Firefox instance. |
| packages/cli/src/utils.ts | Extends CLI browser validation/listing for bidi/foxcloud. |
| packages/cli/test/utils.test.ts | Updates CLI utils test expectations for new browsers. |
| packages/cli/src/commands/run.ts | Adds --bidi-url / --foxcloud-url and instantiates the appropriate browser backend. |
| README.md | Documents experimental BiDi usage with Firefox remote debugging. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
82f4d98 to
a9eac95
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
Comments suppressed due to low confidence (1)
packages/core/src/webAgent.ts:699
- Screenshot events and the multimodal message are hard-coded to
format: "jpeg"/mediaType: "image/jpeg", but not all browser backends necessarily return JPEG screenshots (e.g., BiDi captureScreenshot commonly returns PNG by default). Consider determining the actual image format (by sniffing header bytes or having the browser backend return format metadata) and emitting the matchingformat/mediaTypevalues.
// Emit screenshot captured event
this.emit(WebAgentEventType.BROWSER_SCREENSHOT_CAPTURED, {
size: screenshot.length,
format: "jpeg" as const,
});
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (error instanceof TypeError && String((error as Error).message).includes("substring")) { | ||
| console.warn("[WebAgent] AI SDK image processing failed, stripping images and retrying"); | ||
| this.stripImagesFromLastMessage(); | ||
| needsPageSnapshot = false; // retry with text-only, don't re-snapshot |
There was a problem hiding this comment.
This retry path uses continue without incrementing executionState.currentIteration (unlike the BrowserDisconnectedError branch). If the underlying AI SDK image-processing error persists, the agent can loop forever and never hit max-iteration safeguards. Consider incrementing the iteration counter (or introducing a bounded retry count) before continuing.
| needsPageSnapshot = false; // retry with text-only, don't re-snapshot | |
| needsPageSnapshot = false; // retry with text-only, don't re-snapshot | |
| executionState.currentIteration++; |
| browser: (body.browser ?? serverConfig.browser) as | ||
| | "firefox" | ||
| | "chrome" | ||
| | "chromium" | ||
| | "safari" |
There was a problem hiding this comment.
Casting (body.browser ?? serverConfig.browser) to a Playwright-only union hides the fact that serverConfig.browser can now be "bidi"/"foxcloud" (from shared config defaults). If the server is configured with one of those values, it will still be passed into PlaywrightBrowser at runtime and can fail in a non-obvious way. Consider validating the effective browser value against the Playwright-supported set and returning a 400/500 with a clear message (or constraining the server config schema to Playwright browsers).
| }; | ||
|
|
||
| if (treeResult?.contexts?.length > 0) { | ||
| this.currentContext = treeResult.contexts[0].context; |
There was a problem hiding this comment.
If browsingContext.getTree returns no contexts, start() currently succeeds but leaves currentContext null, causing later operations to fail with a generic "no active browsing context" error. Consider failing fast with a clear error (or explicitly creating/selecting a context) when no contexts are returned.
| this.currentContext = treeResult.contexts[0].context; | |
| this.currentContext = treeResult.contexts[0].context; | |
| } else { | |
| throw new BrowserActionException( | |
| "BiDiBrowser.start(): no browsing contexts available; ensure a page is open or create a context before starting.", | |
| ); |
| const result = (await this.connection.sendCommand("browsingContext.captureScreenshot", { | ||
| context: this.currentContext, | ||
| })) as { data?: string }; | ||
|
|
||
| if (!result.data) { | ||
| throw new BrowserActionException("getScreenshot", "captureScreenshot returned no data"); |
There was a problem hiding this comment.
captureScreenshot is invoked without specifying an output format, so the returned base64 is not guaranteed to be JPEG. Since WebAgent currently emits format: "jpeg" / mediaType: "image/jpeg", this can become incorrect for BiDi backends (often PNG by default) and may break downstream media handling. Consider explicitly requesting JPEG (if supported) or otherwise converting/sniffing the bytes and propagating the correct format/media type.
| export function validateBrowser(browser: string): boolean { | ||
| const validBrowsers = ["firefox", "chrome", "chromium", "safari", "webkit", "edge"]; | ||
| const validBrowsers = [ | ||
| "firefox", | ||
| "chrome", | ||
| "chromium", | ||
| "safari", | ||
| "webkit", | ||
| "edge", | ||
| "bidi", | ||
| "foxcloud", | ||
| ]; | ||
| return validBrowsers.includes(browser); |
There was a problem hiding this comment.
The valid browser list is duplicated here (and again in getValidBrowsers()), which risks drift from the core config schema (BROWSERS in packages/core/src/config/defaults.ts). Consider reusing the shared browser list (or deriving it from the schema) so new backends don’t require updating multiple places.
- Increment iteration counter on AI SDK substring crash recovery to prevent infinite loop if error persists after image stripping - Fail fast in BiDiBrowser.start() when no browsing contexts are available after session.new Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Increment iteration counter on AI SDK substring crash recovery to prevent infinite loop if error persists after image stripping - Fail fast in BiDiBrowser.start() when no browsing contexts are available after session.new Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
227fc58 to
895a45b
Compare
6c25dc0 to
e4ccab9
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 20 changed files in this pull request and generated 6 comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| chalk.yellow(`⚠️ Navigation retry ${attempt}: ${error.message}`), | ||
| chalk.gray(`(next timeout: ${Math.round(nextTimeout / 1000)}s)`), | ||
| ); | ||
| let browser; |
There was a problem hiding this comment.
With tsconfig strict enabled, let browser; is an implicit-any declaration and will fail compilation under noImplicitAny. Give browser an explicit type (e.g., the shared AriaBrowser interface or a union of the supported browser classes) so the CLI package type-checks and downstream usage (new WebAgent(browser, ...)) stays type-safe.
| let browser; | |
| let browser: | |
| | PlaywrightBrowser | |
| | import("pilo-core").BiDiBrowser | |
| | import("pilo-core").FoxcloudBrowser; |
| const browserConfig = { | ||
| browser: body.browser ?? serverConfig.browser, | ||
| // Server only supports Playwright browsers (not bidi/foxcloud) | ||
| browser: (body.browser ?? serverConfig.browser) as | ||
| | "firefox" | ||
| | "chrome" | ||
| | "chromium" | ||
| | "safari" | ||
| | "webkit" | ||
| | "edge", | ||
| channel: body.channel ?? serverConfig.channel, |
There was a problem hiding this comment.
The server config can now resolve browser to values like "bidi"/"foxcloud" (from core config), but this cast only affects TypeScript and does not prevent unsupported values at runtime. If a client or env config sets an unsupported browser, new PlaywrightBrowser(browserConfig) will receive it and fail later with a less clear error. Consider adding an explicit runtime validation here (or in validateTaskRequest) that rejects any non-Playwright browser with a 400 and a clear message, and/or normalize serverConfig.browser to the Playwright-only set for the server package.
| throw new BrowserActionException( | ||
| "getTreeWithRefs", | ||
| `ARIA tree generation did not return a string (got ${typeof yaml}: ${JSON.stringify(result).substring(0, 200)})`, |
There was a problem hiding this comment.
This error message construction can itself throw when result is undefined (since JSON.stringify(undefined) returns undefined, and calling .substring(...) will crash). This defeats the intended BrowserActionException and can hide the real failure. Consider safely stringifying (e.g., fall back to String(result) when JSON.stringify returns null/undefined or throws) before truncating.
| throw new BrowserActionException( | |
| "getTreeWithRefs", | |
| `ARIA tree generation did not return a string (got ${typeof yaml}: ${JSON.stringify(result).substring(0, 200)})`, | |
| let resultPreview: string; | |
| try { | |
| resultPreview = JSON.stringify(result) ?? String(result); | |
| } catch { | |
| resultPreview = String(result); | |
| } | |
| throw new BrowserActionException( | |
| "getTreeWithRefs", | |
| `ARIA tree generation did not return a string (got ${typeof yaml}: ${resultPreview.substring(0, 200)})`, |
| // and retry this iteration rather than counting it as an agent error. | ||
| if ( | ||
| error instanceof TypeError && | ||
| String((error as Error).message).includes("substring") |
There was a problem hiding this comment.
This retry branch matches any TypeError whose message contains "substring", which is broad enough to catch unrelated bugs and will bypass error tracking/span error status. Consider narrowing the detection (e.g., check stack frames/module name for the AI SDK detectMediaType path, or gate behind a specific flag) so other TypeErrors still count as agent errors and surface properly.
| String((error as Error).message).includes("substring") | |
| String((error as Error).message).includes("substring") && | |
| typeof error.stack === "string" && | |
| error.stack.includes("detectMediaType") |
| // Mock global fetch | ||
| const mockFetch = vi.fn(); | ||
| vi.stubGlobal("fetch", mockFetch); | ||
|
|
There was a problem hiding this comment.
vi.stubGlobal("fetch", ...) is set once at module scope and never restored, which can leak into other test files (especially when tests run concurrently) and cause hard-to-debug failures. Prefer stubbing in beforeEach and restoring in afterEach/afterAll (e.g., vi.unstubAllGlobals() or restoring the original fetch).
| const wss = new WebSocketServer({ port: 0 }); | ||
| const addr = wss.address() as { port: number }; | ||
| let client: WsWebSocket | undefined; |
There was a problem hiding this comment.
wss.address() can be null until the server is actually listening; reading it immediately after new WebSocketServer({ port: 0 }) can introduce test flakiness. Consider awaiting the listening event (or providing a callback to ensure the server is bound) before calling address() and building the URL.
Add BiDiBrowser and FoxcloudBrowser implementations that speak raw WebDriver BiDi directly over WebSocket, independent of Playwright. - BiDiBrowser: generic BiDi implementation for any BiDi-capable browser - FoxcloudBrowser: extends BiDiBrowser with foxcloud broker lifecycle - BiDiConnection: minimal WebSocket client for BiDi protocol - CLI: --browser bidi --bidi-url and --browser foxcloud --foxcloud-url - AI SDK image crash recovery (stripImagesFromLastMessage, Uint8Array guard) - Server: narrow browser type to Playwright-only (bidi/foxcloud are CLI-only) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
e4ccab9 to
8bf5b5b
Compare
Summary
AriaBrowserimplementation that controls any BiDi-capable browser over WebSocket. Uses JS evaluation for element interactions (like ExtensionBrowser), not Playwright-specific APIs.--browser bidi --bidi-url ws://...and--browser foxcloud --foxcloud-url http://...Note: BiDi/foxcloud support is CLI-only for now. The server package still uses PlaywrightBrowser exclusively — adding FoxcloudBrowser as a server backend (one VM per request) is a natural follow-up.
Relationship to
--channel moz-firefox(PR #182)PR #182 added support for using WebDriver BiDi through Playwright — when
--channel moz-firefoxis specified, Playwright uses the system Firefox and communicates via BiDi under the hood, but Playwright is still the intermediary.This PR is completely independent of Playwright.
BiDiBrowserspeaks raw WebDriver BiDi directly over WebSocket — no Playwright in the chain. This enables connecting to any BiDi endpoint: local Firefox, remote browsers, or cloud browser services like foxcloud-bidi.--channel moz-firefox(PR #182)--browser bidi(this PR)Design
Two-layer architecture: BiDiBrowser speaks standard W3C WebDriver BiDi (works with any browser), FoxcloudBrowser layers session management on top. The BiDi layer remains useful regardless of what happens to the experimental foxcloud project.
Spec:
docs/superpowers/specs/2026-03-27-bidi-browser-design.md(not committed to this PR)Test plan
🤖 Generated with Claude Code