feat: add WebDriver BiDi browser backend by lmorchard · Pull Request #352 · mozilla/pilo

lmorchard · 2026-03-27T21:06:39Z

Summary

BiDiBrowser: Generic WebDriver BiDi AriaBrowser implementation that controls any BiDi-capable browser over WebSocket. Uses JS evaluation for element interactions (like ExtensionBrowser), not Playwright-specific APIs.
FoxcloudBrowser: Extends BiDiBrowser with foxcloud broker session lifecycle (create VM, poll until ready, connect BiDi, park/resume, cleanup).
BiDiConnection: Minimal WebSocket client for BiDi protocol (~130 lines) — command/response correlation, event emission, timeouts.
CLI: --browser bidi --bidi-url ws://... and --browser foxcloud --foxcloud-url http://...

Note: BiDi/foxcloud support is CLI-only for now. The server package still uses PlaywrightBrowser exclusively — adding FoxcloudBrowser as a server backend (one VM per request) is a natural follow-up.

Relationship to `--channel moz-firefox` (PR #182)

PR #182 added support for using WebDriver BiDi through Playwright — when --channel moz-firefox is specified, Playwright uses the system Firefox and communicates via BiDi under the hood, but Playwright is still the intermediary.

This PR is completely independent of Playwright. BiDiBrowser speaks raw WebDriver BiDi directly over WebSocket — no Playwright in the chain. This enables connecting to any BiDi endpoint: local Firefox, remote browsers, or cloud browser services like foxcloud-bidi.

	`--channel moz-firefox` (PR #182)	`--browser bidi` (this PR)
Dependency	Playwright required	No Playwright
BiDi connection	Managed by Playwright	Direct WebSocket
Element interaction	Playwright locator APIs	JS evaluation
Browser launch	Playwright manages it	External (caller provides URL)
Use case	Use system Firefox instead of Playwright's bundled one	Connect to any BiDi endpoint (local, remote, foxcloud)

Design

Two-layer architecture: BiDiBrowser speaks standard W3C WebDriver BiDi (works with any browser), FoxcloudBrowser layers session management on top. The BiDi layer remains useful regardless of what happens to the experimental foxcloud project.

Spec: docs/superpowers/specs/2026-03-27-bidi-browser-design.md (not committed to this PR)

Test plan

29 new unit tests (bidiConnection, bidiBrowser, foxcloudBrowser) — all passing
660 total tests passing, no regressions
Smoke tested against local Firefox (headless + visible window)
Smoke tested against foxcloud Docker provider
Smoke tested against foxcloud Firecracker provider on GCP
CI green

🤖 Generated with Claude Code

Copilot

Pull request overview

Adds a new WebDriver BiDi-based browser backend to pilo-core, plus a foxcloud session wrapper and CLI wiring, enabling non-Playwright browser control over WebSocket.

Changes:

Introduces BiDiConnection (minimal BiDi WS client) and BiDiBrowser (new AriaBrowser backend using JS evaluation).
Adds FoxcloudBrowser wrapper for broker-backed session lifecycle (create/poll/delete, park/resume).
Updates CLI/config/docs and adds unit + smoke tests for the new backends.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
pnpm-lock.yaml	Locks new deps for WS + BiDi protocol support.
packages/core/package.json	Adds `ws` (+ types) and `webdriver-bidi-protocol` dev dependency.
packages/core/src/browser/bidiConnection.ts	Implements WebSocket transport with command correlation + timeouts + events.
packages/core/src/browser/bidiBrowser.ts	Implements `AriaBrowser` via BiDi commands + JS evaluation utilities.
packages/core/src/browser/foxcloudBrowser.ts	Adds broker session management on top of `BiDiBrowser`.
packages/core/src/index.ts	Re-exports `BiDiBrowser`/`FoxcloudBrowser` from package entrypoint.
packages/core/src/config/defaults.ts	Extends supported browsers list to include `bidi` and `foxcloud`.
packages/core/test/bidiConnection.test.ts	Unit tests for BiDiConnection (connect, timeouts, events, pending rejection).
packages/core/test/bidiBrowser.test.ts	Unit tests for BiDiBrowser navigation, eval-based APIs, screenshots, temp tabs.
packages/core/test/foxcloudBrowser.test.ts	Unit tests for foxcloud session lifecycle and BiDi connection URL wiring.
packages/core/test/smoke-bidi.ts	Manual smoke script against a real Firefox instance.
packages/cli/src/utils.ts	Extends CLI browser validation/listing for `bidi`/`foxcloud`.
packages/cli/test/utils.test.ts	Updates CLI utils test expectations for new browsers.
packages/cli/src/commands/run.ts	Adds `--bidi-url` / `--foxcloud-url` and instantiates the appropriate browser backend.
README.md	Documents experimental BiDi usage with Firefox remote debugging.

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

Comments suppressed due to low confidence (1)

packages/core/src/webAgent.ts:699

Screenshot events and the multimodal message are hard-coded to format: "jpeg" / mediaType: "image/jpeg", but not all browser backends necessarily return JPEG screenshots (e.g., BiDi captureScreenshot commonly returns PNG by default). Consider determining the actual image format (by sniffing header bytes or having the browser backend return format metadata) and emitting the matching format/mediaType values.

        // Emit screenshot captured event
        this.emit(WebAgentEventType.BROWSER_SCREENSHOT_CAPTURED, {
          size: screenshot.length,
          format: "jpeg" as const,
        });

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T17:54:20Z

+        if (error instanceof TypeError && String((error as Error).message).includes("substring")) {
+          console.warn("[WebAgent] AI SDK image processing failed, stripping images and retrying");
+          this.stripImagesFromLastMessage();
+          needsPageSnapshot = false; // retry with text-only, don't re-snapshot


This retry path uses continue without incrementing executionState.currentIteration (unlike the BrowserDisconnectedError branch). If the underlying AI SDK image-processing error persists, the agent can loop forever and never hit max-iteration safeguards. Consider incrementing the iteration counter (or introducing a bounded retry count) before continuing.

Suggested change

needsPageSnapshot = false; // retry with text-only, don't re-snapshot

needsPageSnapshot = false; // retry with text-only, don't re-snapshot

executionState.currentIteration++;

Copilot · 2026-04-01T17:54:20Z

+          browser: (body.browser ?? serverConfig.browser) as
+            | "firefox"
+            | "chrome"
+            | "chromium"
+            | "safari"


Casting (body.browser ?? serverConfig.browser) to a Playwright-only union hides the fact that serverConfig.browser can now be "bidi"/"foxcloud" (from shared config defaults). If the server is configured with one of those values, it will still be passed into PlaywrightBrowser at runtime and can fail in a non-obvious way. Consider validating the effective browser value against the Playwright-supported set and returning a 400/500 with a clear message (or constraining the server config schema to Playwright browsers).

Copilot · 2026-04-01T17:54:20Z

+    };
+
+    if (treeResult?.contexts?.length > 0) {
+      this.currentContext = treeResult.contexts[0].context;


If browsingContext.getTree returns no contexts, start() currently succeeds but leaves currentContext null, causing later operations to fail with a generic "no active browsing context" error. Consider failing fast with a clear error (or explicitly creating/selecting a context) when no contexts are returned.

Suggested change

this.currentContext = treeResult.contexts[0].context;

this.currentContext = treeResult.contexts[0].context;

} else {

throw new BrowserActionException(

"BiDiBrowser.start(): no browsing contexts available; ensure a page is open or create a context before starting.",

);

Copilot · 2026-04-01T17:54:21Z

+      const result = (await this.connection.sendCommand("browsingContext.captureScreenshot", {
+        context: this.currentContext,
+      })) as { data?: string };
+
+      if (!result.data) {
+        throw new BrowserActionException("getScreenshot", "captureScreenshot returned no data");


captureScreenshot is invoked without specifying an output format, so the returned base64 is not guaranteed to be JPEG. Since WebAgent currently emits format: "jpeg" / mediaType: "image/jpeg", this can become incorrect for BiDi backends (often PNG by default) and may break downstream media handling. Consider explicitly requesting JPEG (if supported) or otherwise converting/sniffing the bytes and propagating the correct format/media type.

Copilot · 2026-04-01T17:54:21Z

 export function validateBrowser(browser: string): boolean {
-  const validBrowsers = ["firefox", "chrome", "chromium", "safari", "webkit", "edge"];
+  const validBrowsers = [
+    "firefox",
+    "chrome",
+    "chromium",
+    "safari",
+    "webkit",
+    "edge",
+    "bidi",
+    "foxcloud",
+  ];
  return validBrowsers.includes(browser);


The valid browser list is duplicated here (and again in getValidBrowsers()), which risks drift from the core config schema (BROWSERS in packages/core/src/config/defaults.ts). Consider reusing the shared browser list (or deriving it from the schema) so new backends don’t require updating multiple places.

- Increment iteration counter on AI SDK substring crash recovery to prevent infinite loop if error persists after image stripping - Fail fast in BiDiBrowser.start() when no browsing contexts are available after session.new Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Copilot

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 6 comments.

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-15T17:10:51Z

-            chalk.yellow(`⚠️ Navigation retry ${attempt}: ${error.message}`),
-            chalk.gray(`(next timeout: ${Math.round(nextTimeout / 1000)}s)`),
-          );
+    let browser;


With tsconfig strict enabled, let browser; is an implicit-any declaration and will fail compilation under noImplicitAny. Give browser an explicit type (e.g., the shared AriaBrowser interface or a union of the supported browser classes) so the CLI package type-checks and downstream usage (new WebAgent(browser, ...)) stays type-safe.

Suggested change

let browser;

let browser:

| PlaywrightBrowser

| import("pilo-core").BiDiBrowser

| import("pilo-core").FoxcloudBrowser;

Copilot · 2026-04-15T17:10:51Z

  const browserConfig = {
-    browser: body.browser ?? serverConfig.browser,
+    // Server only supports Playwright browsers (not bidi/foxcloud)
+    browser: (body.browser ?? serverConfig.browser) as
+      | "firefox"
+      | "chrome"
+      | "chromium"
+      | "safari"
+      | "webkit"
+      | "edge",
    channel: body.channel ?? serverConfig.channel,


The server config can now resolve browser to values like "bidi"/"foxcloud" (from core config), but this cast only affects TypeScript and does not prevent unsupported values at runtime. If a client or env config sets an unsupported browser, new PlaywrightBrowser(browserConfig) will receive it and fail later with a less clear error. Consider adding an explicit runtime validation here (or in validateTaskRequest) that rejects any non-Playwright browser with a 400 and a clear message, and/or normalize serverConfig.browser to the Playwright-only set for the server package.

Copilot · 2026-04-15T17:10:52Z

+          throw new BrowserActionException(
+            "getTreeWithRefs",
+            `ARIA tree generation did not return a string (got ${typeof yaml}: ${JSON.stringify(result).substring(0, 200)})`,


This error message construction can itself throw when result is undefined (since JSON.stringify(undefined) returns undefined, and calling .substring(...) will crash). This defeats the intended BrowserActionException and can hide the real failure. Consider safely stringifying (e.g., fall back to String(result) when JSON.stringify returns null/undefined or throws) before truncating.

Suggested change

throw new BrowserActionException(

"getTreeWithRefs",

`ARIA tree generation did not return a string (got ${typeof yaml}: ${JSON.stringify(result).substring(0, 200)})`,

let resultPreview: string;

try {

resultPreview = JSON.stringify(result) ?? String(result);

} catch {

resultPreview = String(result);

}

throw new BrowserActionException(

"getTreeWithRefs",

`ARIA tree generation did not return a string (got ${typeof yaml}: ${resultPreview.substring(0, 200)})`,

Copilot · 2026-04-15T17:10:52Z

+            // and retry this iteration rather than counting it as an agent error.
+            if (
+              error instanceof TypeError &&
+              String((error as Error).message).includes("substring")


This retry branch matches any TypeError whose message contains "substring", which is broad enough to catch unrelated bugs and will bypass error tracking/span error status. Consider narrowing the detection (e.g., check stack frames/module name for the AI SDK detectMediaType path, or gate behind a specific flag) so other TypeErrors still count as agent errors and surface properly.

Suggested change

String((error as Error).message).includes("substring")

String((error as Error).message).includes("substring") &&

typeof error.stack === "string" &&

error.stack.includes("detectMediaType")

Copilot · 2026-04-15T17:10:52Z

+// Mock global fetch
+const mockFetch = vi.fn();
+vi.stubGlobal("fetch", mockFetch);
+


vi.stubGlobal("fetch", ...) is set once at module scope and never restored, which can leak into other test files (especially when tests run concurrently) and cause hard-to-debug failures. Prefer stubbing in beforeEach and restoring in afterEach/afterAll (e.g., vi.unstubAllGlobals() or restoring the original fetch).

Copilot · 2026-04-15T17:10:53Z

+  const wss = new WebSocketServer({ port: 0 });
+  const addr = wss.address() as { port: number };
+  let client: WsWebSocket | undefined;


wss.address() can be null until the server is actually listening; reading it immediately after new WebSocketServer({ port: 0 }) can introduce test flakiness. Consider awaiting the listening event (or providing a callback to ensure the server is bound) before calling address() and building the URL.

Add BiDiBrowser and FoxcloudBrowser implementations that speak raw WebDriver BiDi directly over WebSocket, independent of Playwright. - BiDiBrowser: generic BiDi implementation for any BiDi-capable browser - FoxcloudBrowser: extends BiDiBrowser with foxcloud broker lifecycle - BiDiConnection: minimal WebSocket client for BiDi protocol - CLI: --browser bidi --bidi-url and --browser foxcloud --foxcloud-url - AI SDK image crash recovery (stripImagesFromLastMessage, Uint8Array guard) - Server: narrow browser type to Playwright-only (bidi/foxcloud are CLI-only) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

github-advanced-security AI found potential problems Mar 27, 2026

View reviewed changes

Comment thread packages/core/src/browser/bidiBrowser.ts Fixed

Comment thread packages/core/src/browser/foxcloudBrowser.ts Fixed

lmorchard force-pushed the feature/bidi-browser branch from 8b48c77 to d02dcde Compare March 27, 2026 21:09

lmorchard requested a review from Copilot March 27, 2026 21:11

Copilot started reviewing on behalf of lmorchard March 27, 2026 21:11 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

lmorchard force-pushed the feature/bidi-browser branch 4 times, most recently from 82f4d98 to a9eac95 Compare March 27, 2026 23:42

lmorchard requested a review from Copilot April 1, 2026 17:49

Copilot started reviewing on behalf of lmorchard April 1, 2026 17:49 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

lmorchard force-pushed the feature/bidi-browser branch 2 times, most recently from 227fc58 to 895a45b Compare April 1, 2026 22:12

lmorchard force-pushed the feature/bidi-browser branch 5 times, most recently from 6c25dc0 to e4ccab9 Compare April 15, 2026 16:43

lmorchard requested a review from Copilot April 15, 2026 17:06

Copilot started reviewing on behalf of lmorchard April 15, 2026 17:06 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

lmorchard force-pushed the feature/bidi-browser branch from e4ccab9 to 8bf5b5b Compare April 15, 2026 17:27

	needsPageSnapshot = false; // retry with text-only, don't re-snapshot
	needsPageSnapshot = false; // retry with text-only, don't re-snapshot
	executionState.currentIteration++;

-    let browser;
+    let browser:
+      | PlaywrightBrowser
+      | import("pilo-core").BiDiBrowser
+      | import("pilo-core").FoxcloudBrowser;

-              String((error as Error).message).includes("substring")
+              String((error as Error).message).includes("substring") &&
+              typeof error.stack === "string" &&
+              error.stack.includes("detectMediaType")

Conversation

lmorchard commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Relationship to --channel moz-firefox (PR #182)

Design

Test plan

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lmorchard commented Mar 27, 2026 •

edited

Loading

Relationship to `--channel moz-firefox` (PR #182)