rtrvr.ai API: Infrastructure for the Agentic Web
"Analyze these 5 companies—products, pricing, customer segments."
I typed that into Slack. 30 seconds later, structured intelligence on all 5 companies landed right back in the channel.
Here's what happened under the hood:
→ Slack message triggered an n8n workflow
→ n8n called the rtrvr.ai API
→ 5 cloud browsers spun up in parallel
→ Each navigated, extracted, and returned structured data
→ Results pushed back to Slack
No Puppeteer scripts. No brittle CSS selectors. No infrastructure to babysit. Just a natural-language task description and an HTTP call.
Today, we're launching the rtrvr.ai API—the technical foundation that makes this possible.
What Just Happened
For the past six months, thousands of users have been running AI web agents through our Chrome extension and cloud platform. They've automated everything from lead research to job applications, from competitor monitoring to compliance tracking.
The results spoke for themselves: 81.39% success rate on the Halluminate Web Bench—the highest of any AI agent. But there was a catch: every execution required manual triggering through our UI.
Teams wanted to embed this power into their own workflows:
- Sales tools that enrich leads automatically
- Hiring platforms that source candidates continuously
- Ops teams that monitor competitors and partners in real time
They didn’t want “another UI.” They wanted infrastructure.
Not just any API—one that could spin up hundreds of browser agents in parallel, handle complex authentication flows, extract structured data reliably, and return clean JSON or direct-to-Sheets output. One that works with the tools they already use: n8n, Zapier, Make, or raw HTTP from any language.
That's what we built.
The Technical Foundation
Parallel Cloud Browser Execution
Traditional web scraping is sequential. Scrape site A, wait for it to finish, then scrape site B. Want to analyze 100 companies? Set aside an afternoon.
The rtrvr.ai API changes that. Send one request with multiple targets. We spin up isolated browser instances for each, executing them in parallel across our infrastructure. What used to take hours happens in seconds.
The architecture:
- Each API call can trigger multiple browser agents simultaneously
- Isolated execution environments prevent interference
- Automatic resource scaling based on load
- Results can be streamed back or returned at once, depending on verbosity
You get the performance of a custom scraping cluster without having to build or maintain one.
DOM-Based Intelligence (Not Just Screenshots)
Most AI web agents rely heavily on screenshots. They're essentially trying to understand websites through pixel analysis—like trying to read a book by looking at photos of each page.
This breaks constantly. Pop-ups cover content. Dynamic layouts shift. Mobile vs desktop renders differently. The agent sees pixels where it expected buttons.
rtrvr.ai reads the actual HTML structure—the Document Object Model (DOM), the same structured data your browser uses to render pages. This means:
- Fewer vision-based failures: Overlays and pop-ups don't block us—we can often read through them
- Layout-agnostic extraction: Redesigns and responsive layouts are more resilient
- Accurate form filling: We interact with real form elements, not pixel coordinates
- Authenticated content: Works with logged-in sessions and paywalled / internal tools
This DOM-first approach is a core reason we achieved an 81.39% success rate while vision-only agents struggled between 40–60%.
Structured Data Extraction by Default
The API doesn't just fetch web pages—it returns structured data in the exact format you need.
Three output modes:
- Inline JSON – Structured data returned directly in the API response
- Google Sheets – Write into a persistent spreadsheet for your team
- Snapshot Reference – For large datasets, get a GCS URL to download complete results
Define your schema once. Get consistent output every time. No post-processing cleanup, no manual HTML parsing, no regex nightmares.
{
"input": "For each company, extract: name, website, product description, pricing, and latest funding round",
"urls": [
"https://company1.com",
"https://company2.com",
"https://company3.com"
],
"schema": {
"type": "object",
"properties": {
"companies": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"website": { "type": "string" },
"product": { "type": "string" },
"pricing": { "type": "string" },
"funding": { "type": "string" }
}
}
}
}
}
}
The agent handles navigation, pagination, authentication, and extraction. You get back clean, validated JSON.
Natural Language Control
No need to specify CSS selectors, XPath queries, or brittle navigation logic. Describe what you want in plain English:
- "Find the pricing table and extract all plan names and monthly costs."
- "Navigate to the careers page and list all engineering positions in San Francisco."
- "Click through to the second page of search results and extract company names."
The agent understands intent, adapts to site structure, and handles edge cases without explicit programming.
What You Can Build on This
The pattern is simple: API call → cloud browsers → structured data → your application.
These are some of the most common patterns we see in production.
Real-Time Sales Intelligence
Enrich leads as they enter your CRM:
// When a new lead is created
webhook.on('lead.created', async (lead) => {
const res = await fetch('https://api.rtrvr.ai/execute', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.RTRVR_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
input: `Visit ${lead.website} and extract: company size, tech stack, latest news, key decision makers`,
urls: [lead.website],
response: { verbosity: 'final' }
})
})
const enrichment = await res.json()
crm.updateLead(lead.id, enrichment.result.json)
})
No more manual research. No more paying per-enrichment fees to data vendors. Real-time intelligence from the live web.
Automated Job Application Pipelines
Apply to hundreds of positions while maintaining personalization:
const jobBoards = [
'https://www.linkedin.com/jobs',
'https://wellfound.com',
'https://www.ycombinator.com/jobs'
]
const res = await fetch('https://api.rtrvr.ai/execute', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.RTRVR_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
input: `Find all software engineering jobs in San Francisco posted in the last 7 days. For each job, extract: company name, role title, required experience, application link`,
urls: jobBoards,
response: { verbosity: 'final' }
})
})
const jobs = await res.json()
jobs.result.json.positions
.filter(job => matchesCriteria(job))
.forEach(job => applyWithCustomCoverLetter(job))
The agent does discovery and triage. Your logic handles filtering and personalization.
Competitor Intelligence at Scale
Monitor pricing, features, and positioning across your competitive landscape:
// Run daily via cron
const competitors = ['competitor1.com', 'competitor2.com', 'competitor3.com']
const res = await fetch('https://api.rtrvr.ai/execute', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.RTRVR_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
input: `Extract: product features list, pricing tiers with costs, latest blog post titles and dates, job openings count`,
urls: competitors.map(c => `https://${c}`),
response: { verbosity: 'final' }
})
})
const intel = await res.json()
if (pricingChanged(intel)) {
notify.slack('Competitor pricing update detected')
}
Stop spot-checking competitors in a browser tab. Let agents do the watching for you.
Hiring Pipeline Automation
Source candidates, screen profiles, and build talent pipelines automatically:
const res = await fetch('https://api.rtrvr.ai/execute', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.RTRVR_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
input: `Find senior engineers at Series B AI companies in SF. For each person: current role, previous companies, education, GitHub if available. Rank by relevance to ML infrastructure roles.`,
urls: ['https://www.linkedin.com/search', 'https://github.com'],
response: { verbosity: 'final' }
})
})
const candidates = await res.json()
// Write candidates into your ATS or a Sheet for the recruiting team
Support for enabling extra tools like Google Sheets writeback directly via tools.enableAdditionalTools is rolling out soon—you’ll be able to turn on capabilities such as generate_sheets from the same API call once it’s available on your account.
Integration-First Architecture
The API is designed to plug into the stack you already have.
n8n workflow:
- Trigger: Slack message, new row in Sheets, webhook, schedule
- Action: Call rtrvr.ai API with dynamic parameters
- Output: Post to Slack, update a database, trigger the next step
Zapier automation:
- New CRM lead → rtrvr.ai enrichment → Update CRM with data
Make scenario:
- Schedule → rtrvr.ai competitor check → Compare with history → Send Telegram / Slack alert
Raw API from any language:
- Python, Node.js, Go, Ruby—standard HTTP POST requests
- Full API documentation
No SDK required. One endpoint, one mental model.
Why This Matters: The Agentic Web Needs Infrastructure
We're entering an era where AI agents handle an increasing share of web interactions. But agents need infrastructure—the same way mobile apps needed cloud hosting, web apps needed CDNs, and data science needed distributed computing.
The agentic web requires:
- Scale – Hundreds of concurrent browser sessions, not one-at-a-time scraping
- Reliability – 80%+ success rates, not “40% plus manual cleanup”
- Speed – Seconds to results, not hours of sequential processing
- Integration – APIs that fit existing workflows, not standalone toys
This is infrastructure-level technology. The companies building the next generation of AI-powered applications need a web data layer they can rely on.
Whether you're building:
- Sales automation tools
- Recruiting platforms
- Market intelligence dashboards
- Content aggregation services
- Compliance monitoring systems
- Research automation workflows
- Lead generation engines
rtrvr.ai becomes your web data infrastructure.
From Manual to API-First: The Evolution
Six months ago: Our Chrome extension let individuals automate their own web tasks. Developers loved it, but integration required workarounds.
Three months ago: We launched the cloud platform with scheduled workflows and webhook triggers. Teams could automate recurring tasks, but still needed manual configuration.
Today: The API makes rtrvr.ai programmable infrastructure. Embed browser agents anywhere in your stack. Trigger from any event. Return data in the format your systems expect. Scale to thousands of concurrent requests.
The progression mirrors how computing has evolved:
- Computers → Personal computing
- Personal computing → Cloud computing
- Cloud computing → Serverless computing
Web automation is following the same path:
- Manual scraping → AI agent extension
- AI agent extension → Cloud agent platform
- Cloud agent platform → API-first agent infrastructure
Technical Specifications
Performance Metrics
From our Web Bench evaluation:
- 81.39% overall success rate (highest of any AI agent)
- 0.9 minutes average task completion (7–13x faster than alternatives)
- $0.12 average cost per task (25x more cost-effective than cloud vision agents in the benchmark)
- 88.24% success on read tasks (data extraction and information retrieval)
- 65.63% success on write tasks (form filling and complex interactions)
API Features
Core capabilities:
- Parallel browser execution across multiple URLs
- Natural language task description
- Custom schema definition for structured output
- In-memory tabular data processing (CSV/JSON → virtual sheets)
- Persistent Google Sheets integration
- Webhook callbacks for async completion
- Scheduled execution (cron-style)
- MCP (Model Context Protocol) tool integration
Response options:
final: Just the result (minimal API payload)steps: Step-by-step execution summary (for debugging)debug: Full tool calls and state (deep inspection)
Data handling:
- Inline JSON up to 256KB in the response
- Larger datasets snapshot to GCS with a download URL
- Direct Google Sheets writing for persistent data
- CSV/TSV/JSON/XLSX input support
Continuation support:
- Multi-phase workflows with trajectory IDs
- Pause-and-resume for human input
- Conversation history for context preservation
Getting Started
1. Get Your API Key
Visit rtrvr.ai/cloud and navigate to the API Keys section. Generate a new key with your desired rate limits and permissions.
2. Make Your First Request
curl -X POST https://api.rtrvr.ai/execute \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Extract the main headline and first paragraph from this article",
"urls": ["https://example.com/article"],
"response": { "verbosity": "final" }
}'
3. Connect to Your Workflow
n8n:
- Add an HTTP Request node
- Method:
POST - URL:
https://api.rtrvr.ai/execute - Authentication: Header with your API key
- Body: JSON with your task description (
input,urls, optionalschema)
Zapier:
- Use Webhooks by Zapier
- Action:
POSTrequest - Same authentication and body structure
Python:
import os
import requests
api_key = os.environ["RTRVR_API_KEY"]
response = requests.post(
"https://api.rtrvr.ai/execute",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
},
json={
"input": "Your task description",
"urls": ["https://example.com"],
"response": {"verbosity": "final"},
},
)
data = response.json()
print(data["result"]["json"])
4. Scale Up
Start with single requests. Once you validate the approach, scale to:
- Parallel execution across many targets
- Scheduled recurring workflows
- Webhook-triggered automation
- Batch processing with Google Sheets as the control plane
Demo: See It In Action
Watch how the API powers real-time competitive intelligence through Slack:
Video: rtrvr.ai API Demo - Slack to Structured Data
30 seconds from question to answer. No infrastructure to manage. No brittle scripts to maintain.
Pricing & Credits
The API uses the same credit system as our cloud platform:
- ~$0.12 average per task in our benchmarks (actual cost varies by complexity and tools used)
- Volume discounts for high-throughput applications
- Transparent billing with per-request credit usage in the response
- No infrastructure costs – pay only for what you use
Every API response can include a usage block like:
{
"usage": {
"creditsUsed": 0.12,
"creditsLeft": 487.88,
"currentCreditsUsed": 12.15
}
}
The Road Ahead
This is just the beginning. On our roadmap:
Advanced tool integration:
- Document generation (Docs, PDFs, Presentations)
- Database querying and writing
- Multi-step approval workflows
- Custom tool definition via MCP
Enhanced control:
- Browser session recording for debugging
- Screenshot capture at key steps
- Custom browser profiles (cookies, local storage, auth)
- Geolocation and proxy control
Developer experience:
- More copy-pasteable code examples for popular stacks
- Deeper observability around each run (logs, traces, metrics)
- A richer playground for designing and testing tasks
- Tight integrations with low-code tools and agent frameworks
Enterprise features:
- Team API keys with role-based permissions
- Audit logs for all executions
- SLA guarantees for production workloads
- Dedicated infrastructure for high-volume clients
We're building the infrastructure for the agentic web. Join us.
Get Started Today
The agentic web needs infrastructure. We're building the rails.
Start building:
Already using rtrvr.ai?
Whether you're automating sales research, monitoring competitors, sourcing candidates, or building the next generation of AI-powered applications—rtrvr.ai is your web data layer.
The infrastructure is here. The API is live. Build the future of web automation.
Questions? Reach out to support@rtrvr.ai or join our Discord community
