Skip to the content.

Capabilities

Foragent exposes browser operations as discrete A2A capabilities. Callers invoke capabilities by name; Foragent handles the browser mechanics.

Advertised capabilities (v0.2)

Three v0.1/v0.2 specialists have been removed as browser-task subsumes them. The project is pre-public so no deprecation path was required:

browser-task input shape

JSON in the first text part, or field-by-field metadata:

{
  "intent": "free-form description of what to accomplish",
  "allowedHosts": ["bsky.app", "*.example.com", "*"],
  "url": "optional absolute http(s) starting URL",
  "credentialId": "optional broker reference",
  "maxSteps": 60,
  "maxSeconds": 120
}

browser-task output shape

A JSON object in a single text part:

{
  "status": "done" | "failed" | "incomplete",
  "summary": "one-sentence human-readable result",
  "result": "optional structured result text (e.g. extracted value)",
  "steps": 7,
  "navigations": ["https://host/path", "..."]
}

incomplete means the budget was exhausted before done/fail was called. For extraction-style tasks, instruct the planner to return JSON via the result field — e.g. intent "Open https://shop.example/p/42 and return {\"name\":..., \"price_usd\":...} as JSON in the result field.". The planner is not schema-enforced the way extract-structured-data used to be, so keep the target shape explicit in the intent.

browser-task tool surface

Exposed to the planner via [AIFunction] wrappers over IChatClient (spec Appendix A #16 — no MCP sidecar). Refs are Playwright aria-ref ids and are valid only within the snapshot they came from.

Design principles