Scrape API (`/scrape`)

Low-level endpoint that reuses the same browser + proxy infra as the agent, but returns raw page text and accessibility trees. No planner, no tools—just data for your own models and pipelines.

Try Scrape Playground Get API Key

Scrape API Playground

POST/scrape

Low-level endpoint for raw page text + accessibility tree.

API KeyGet your API key

URLs (one per line)

# Basic scrape using profile defaults
curl -X POST https://api.rtrvr.ai/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com/blog/ai-trends-2025"],
    "response": { "inlineOutputMaxBytes": 1048576 }
  }'

# With per-request settings override
curl -X POST https://api.rtrvr.ai/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/blog/ai-trends-2025",
      "https://example.com/pricing"
    ],
    "settings": {
      "extractionConfig": {
        "onlyTextContent": true
      },
      "proxyConfig": {
        "mode": "default"
      }
    },
    "response": {
      "inlineOutputMaxBytes": 1048576
    }
  }'

Base URLhttps://api.rtrvr.ai

All HTTP endpoints share this base URL. Use /scrape for raw page data and /execute for full agent runs.

Use your API key in the Authorization header:

Header

Authorization: Bearer rtrvr_your_api_key

Security: Keep your key server-side (backend or serverless). Don't ship it to the browser.

POSThttps://api.rtrvr.ai/scrape

Execute vs Scrape

Use /execute when you want the full planner + tools engine, and /scrape when you just need raw page text + structure for your own models and pipelines.

See comparison

Open one or more URLs in our browser cluster and get back extracted text, the accessibility tree, and link metadata. The endpoint is designed to be:

Cheap – infra-only credits (browser + proxy), no model usage.
Predictable – stable schema for tab content + usage metrics.
Composable – plug the result into your own LLM/RAG pipeline.

Minimal scrape – single URL

curl -X POST https://api.rtrvr.ai/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com/blog/ai-trends-2025"]
  }'

Each scrape uses a unified UserSettings profile. This profile is stored in the cloud and shared across the web app, extension, and API.

Relevant UserSettings fields (conceptual)

interface UserSettings {
  // Extraction Configuration
  extractionConfig: {
    maxParallelTabs?: number;
    pageLoadDelay?: number;
    makeNewTabsActive?: boolean;
    writeRowProcessingTime?: boolean;
    disableAutoScroll?: boolean;
    /**
     * When true, only text content is returned from scrapes.
     * The accessibility tree + elementLinkRecord are omitted.
     */
    onlyTextContent?: boolean;
  };

  // Proxy Configuration
  proxyConfig: {
    mode: 'none' | 'custom' | 'default' | 'device';
    customProxies: ProxySettings[];
    selectedProxyId?: string;
    selectedDeviceId?: string;
  };

  // ...other fields omitted for brevity
}

There are two ways to control behavior:

1. Cloud profile (recommended): configure defaults in Cloud → Settings. Every API call reads this profile.
2. Per-request overrides: send settings: Partial<UserSettings> in your ApiScrapeRequest. The backend merges this on top of your stored profile.

Example: send { extractionConfig: { onlyTextContent: true } } to get text-only scrapes for one call without changing your global defaults.

The request body is an ApiScrapeRequest:

ApiScrapeRequest (conceptual)

interface ApiScrapeRequest {
  /**
   * Optional stable id if you want to tie multiple scrapes together.
   * Mostly useful for analytics/observability on your side.
   */
  trajectoryId?: string;

  /**
   * One or more absolute URLs to load in the browser.
   * Must be a non-empty array of non-empty strings.
   */
  urls: string[];

  /**
   * Optional per-request settings override.
   * Merged on top of the stored UserSettings profile (proxyConfig, extraction, etc.).
   *
   * Use extraction-related settings if you only want text content and don't need
   * the accessibility tree + elementLinkRecord.
   */
  settings?: Partial<UserSettings>;

  /**
   * Response size control for API callers.
   */
  response?: {
    /**
     * Max bytes allowed for the inline JSON response.
     * If the full response exceeds this, the full payload is stored in object storage
     * and a StorageReference is returned under metadata.responseRef.
     * Default: 1MB (1048576 bytes)
     */
    inlineOutputMaxBytes?: number;
  };
}

Parameters

urlsstring[]required

One or more absolute URLs to scrape. Must be a non-empty array of non-empty strings.

trajectoryIdstring

Optional stable id for grouping scrapes together (analytics, observability).

settingsPartial<UserSettings>

Optional per-request override merged on top of your cloud UserSettings profile.

response.inlineOutputMaxBytesnumberdefault: 1048576

Maximum inline response size in bytes (default 1MB). Larger payloads are snapshot and referenced via metadata.responseRef.

The API response is an ApiScrapeResponse:

ApiScrapeResponse (conceptual)

interface ScrapedTab {
  tabId: number;
  url: string;
  title: string;
  contentType: string;
  status: "success" | "error";
  error?: string;

  /**
   * Full extracted visible text (when available).
   */
  content?: string;

  /**
   * JSON-encoded accessibility tree (stringified).
   * Use this if you want a rich, structured view of the page for your own models.
   * Every link node in the tree has a numeric 'id' field which is used as the key
   * in elementLinkRecord.
   */
  tree?: string;

  /**
   * Map of accessibility-tree element id -> href/URL for link elements.
   * Only present when 'tree' is present.
   */
  elementLinkRecord?: Record<number, string>;
}

interface ScrapeUsageData {
  totalCredits: number;
  browserCredits: number;
  proxyCredits: number;
  totalUsd: number;
  requestDurationMs: number;
  proxyPageLoads: number;
  proxyTabsDataFetches: number;
  usingBillableProxy: boolean;
}

interface ApiScrapeResponse {
  success: boolean;
  status: "success" | "error";
  trajectoryId: string;

  tabs?: ScrapedTab[];
  usageData: ScrapeUsageData;

  metadata?: {
    inlineOutputMaxBytes: number;
    durationMs: number;
    outputTooLarge?: boolean;
    responseRef?: StorageReference;
  };

  error?: string;
}

Tabs & content

tabsScrapedTab[]

One tab per URL, in the same order as the input urls.

tabs[].contentstring

Full extracted visible text when available. Omitted if onlyTextContent=false and extraction failed.

tabs[].treestring

JSON-encoded accessibility tree (stringified). Omitted when onlyTextContent=true.

tabs[].elementLinkRecordRecord<number, string>

Lookup table mapping accessibility-tree element id → href/URL for link elements.

Infra usage

usageData.totalCreditsnumber

Total infra credits consumed by this scrape.

usageData.browserCreditsnumber

Credits attributable to browser usage.

usageData.proxyCreditsnumber

Credits attributable to proxy usage.

usageData.requestDurationMsnumber

End-to-end latency for the scrape request in ms.

Use usageData for analytics and cost dashboards. Agents share the same infra counters internally.

cURL

# Basic scrape using profile defaults
curl -X POST https://api.rtrvr.ai/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com/blog/ai-trends-2025"],
    "response": { "inlineOutputMaxBytes": 1048576 }
  }'

# With per-request settings override
curl -X POST https://api.rtrvr.ai/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/blog/ai-trends-2025",
      "https://example.com/pricing"
    ],
    "settings": {
      "extractionConfig": {
        "onlyTextContent": true
      },
      "proxyConfig": {
        "mode": "default"
      }
    },
    "response": {
      "inlineOutputMaxBytes": 1048576
    }
  }'

Ready to automate?

Join teams using rtrvr.ai to build playful, powerful web automation workflows.

interface UserSettings { // Extraction Configuration extractionConfig: { maxParallelTabs?: number; pageLoadDelay?: number; makeNewTabsActive?: boolean; writeRowProcessingTime?: boolean; disableAutoScroll?: boolean; /** * When true, only text content is returned from scrapes. * The accessibility tree + elementLinkRecord are omitted. */ onlyTextContent?: boolean; }; // Proxy Configuration proxyConfig: { mode: 'none' | 'custom' | 'default' | 'device'; customProxies: ProxySettings[]; selectedProxyId?: string; selectedDeviceId?: string; }; // ...other fields omitted for brevity }

interface ApiScrapeRequest { /** * Optional stable id if you want to tie multiple scrapes together. * Mostly useful for analytics/observability on your side. */ trajectoryId?: string; /** * One or more absolute URLs to load in the browser. * Must be a non-empty array of non-empty strings. */ urls: string[]; /** * Optional per-request settings override. * Merged on top of the stored UserSettings profile (proxyConfig, extraction, etc.). * * Use extraction-related settings if you only want text content and don't need * the accessibility tree + elementLinkRecord. */ settings?: Partial<UserSettings>; /** * Response size control for API callers. */ response?: { /** * Max bytes allowed for the inline JSON response. * If the full response exceeds this, the full payload is stored in object storage * and a StorageReference is returned under metadata.responseRef. * Default: 1MB (1048576 bytes) */ inlineOutputMaxBytes?: number; }; }

interface ScrapedTab { tabId: number; url: string; title: string; contentType: string; status: "success" | "error"; error?: string; /** * Full extracted visible text (when available). */ content?: string; /** * JSON-encoded accessibility tree (stringified). * Use this if you want a rich, structured view of the page for your own models. * Every link node in the tree has a numeric 'id' field which is used as the key * in elementLinkRecord. */ tree?: string; /** * Map of accessibility-tree element id -> href/URL for link elements. * Only present when 'tree' is present. */ elementLinkRecord?: Record<number, string>; } interface ScrapeUsageData { totalCredits: number; browserCredits: number; proxyCredits: number; totalUsd: number; requestDurationMs: number; proxyPageLoads: number; proxyTabsDataFetches: number; usingBillableProxy: boolean; } interface ApiScrapeResponse { success: boolean; status: "success" | "error"; trajectoryId: string; tabs?: ScrapedTab[]; usageData: ScrapeUsageData; metadata?: { inlineOutputMaxBytes: number; durationMs: number; outputTooLarge?: boolean; responseRef?: StorageReference; }; error?: string; }

# Basic scrape using profile defaults curl -X POST https://api.rtrvr.ai/scrape \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://example.com/blog/ai-trends-2025"], "response": { "inlineOutputMaxBytes": 1048576 } }' # With per-request settings override curl -X POST https://api.rtrvr.ai/scrape \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": [ "https://example.com/blog/ai-trends-2025", "https://example.com/pricing" ], "settings": { "extractionConfig": { "onlyTextContent": true }, "proxyConfig": { "mode": "default" } }, "response": { "inlineOutputMaxBytes": 1048576 } }'

Scrape API (`/scrape`)

Scrape API Playground

Authentication

Endpoint

User Settings

Request Schema

Parameters

Response Schema

Tabs & content

Infra usage

Code Examples

Ready to automate?

Scrape API (`/scrape`)

Scrape API Playground

Authentication

Endpoint

User Settings

Request Schema

Parameters

Response Schema

Tabs & content

Infra usage

Code Examples

Ready to automate?

Scrape API (/scrape)

Scrape API Playground

Authentication

Endpoint

User Settings

Request Schema

Parameters

Response Schema

Tabs & content

Infra usage

Code Examples

Ready to automate?

Scrape API (/scrape)

Scrape API Playground

Authentication

Endpoint

User Settings

Request Schema

Parameters

Response Schema

Tabs & content

Infra usage

Code Examples

Ready to automate?

Scrape API (`/scrape`)

Scrape API (`/scrape`)