Your AI Agent Isn’t Dumb. It’s Drowning in Context Rot
Most people debug failing agents by swapping models.
GPT-4 → Sonnet → GPT-4.1 → “maybe we need a custom fine-tune?”
In reality, the model is often fine.
The context you’re feeding it is not.
In this post, I’ll walk through a simple benchmark I ran with three scrapers—rtrvr.ai, Firecrawl, and Parallel—on three real-world sites:
- A Reddit thread:
r/AI_Agents chatgpt.com- Amazon search for “smart ring”
The raw results are interesting.
The reason behind them is where things get valuable.
The Experiment: 3 Scrapers, 3 Sites
I tested each tool’s scrape/extract endpoint on:
- Reddit thread – noisy DOM, dynamic content, comments, sidebars
- ChatGPT.com – app-style UI, inputs, buttons, placeholders
- Amazon smart ring search – grid of products, prices, ratings, pagination
What I measured
- Did it return any data?
- Did it return the right data?
- Was the output structured and usable for an LLM-powered agent?
The quick scoreboard
| Site | Firecrawl | Parallel | rtrvr.ai |
|---|---|---|---|
| Reddit thread | ❌ Failed | ❌ Failed | ✅ Full extraction |
| ChatGPT.com | ⚠️ Partial (some page data) | ❌ Failed | ✅ Full page, including input metadata |
| Amazon “smart ring” | ⚠️ 29/48 products extracted | ❌ No data | ✅ Product data, structured output |
Firecrawl and rtrvr.ai were similar on speed.
Parallel struggled to return anything useful on all three.
But raw success/fail isn’t the punchline.
The Real Problem: Context Rot
Most scrapers do something like this:
- Fetch the page
- Strip some tags
- Dump a giant blob of HTML or Markdown into your prompt
Then you ask your agent:
“From this page, find all smart rings, their prices, and links.”
What the LLM actually receives:
- Script tags
- Style blocks
- Tracking pixels
- Cookie banners
- Nested div soup
- Repeated headers/footers
- A/B test elements
- Ads and recommendation carousels
Your real data is maybe 5–10% of the tokens.
The rest is noise.
That’s context rot:
Over time, your agents get less signal and more noise in their prompts, until even “easy” tasks start to fail.
Symptoms you’re probably blaming on the model:
- It “forgets” items from a list
- It mislabels fields (e.g., price vs “strike-through” price)
- It randomly skips products
- It hallucinates structure that isn’t on the page
The model isn’t dumb.
It’s trying to infer structure from trash.
Why rtrvr.ai’s Output Looked Different
When I ran the same tests through rtrvr.ai, the outputs had a completely different feel.
Instead of a Markdown/HTML dump, I got a semantic, structured tree:
- Elements tagged with roles and types
- Hierarchy preserved
- Inputs, buttons, labels, product cards explicitly identified
- SVGs and interactive elements represented in a machine-friendly way
Think of it like an accessibility tree plus an extraction layer, not a prettified HTML export.
For an LLM, that changes the problem from:
“Here’s 15,000 tokens of semi-random markup, good luck.”
to:
“Here’s a tree of meaningful objects:
- This is a product title
- This is a price
- This is a rating
- This is an ‘Add to cart’ button
- This is the search input field
- Here’s how they’re nested.”
No guessing.
No manual parsing glue.
No hallucinating structure that was never there.
On Reddit, that meant rtrvr.ai could navigate and extract thread content where the others simply failed.
On ChatGPT.com, rtrvr.ai surfaced things like:
- The main message list
- The system text
- The input textbox (and its placeholder text)
which is exactly what an agent needs to reason about “what can I do here?”
On Amazon, rtrvr.ai produced structured products from the full page, while Firecrawl only managed 29/48 and Parallel returned nothing.
Why This Matters for LLM Choice and Cost
Most teams reach for a bigger model when the agent fails.
“Let’s just throw GPT-4 at it.”
But if you’re feeding GPT-4 raw HTML soup, you’re paying for:
- HTML parsing
- Noise filtering
- Structure inference
…before it even starts doing the actual business task.
You’re using a $20B reasoning model as a glorified regex.
Clean, structured input changes the economics:
- Cheaper models win – Even a small model like Gemini Flash Lite can reliably extract products, prices, and metadata if the input is already well-structured.
- Token usage drops – No need to stuff 10k+ tokens of junk into the context window.
- Latency shrinks – Less text to read, fewer passes needed.
- Reliability improves – The agent isn’t guessing what’s a button vs a label vs a random
div.
You don’t need GPT-4 to “read a product listing”.
You need GPT-4 when you force it to also be:
- HTML parser
- DOM cleaner
- Layout interpreter
All at once.
That’s the tax most agent stacks are quietly paying.
The Business Angle: Context as a KPI
If you care about KPIs like:
- Task success rate
- Average cost per run
- Average time to result
- Human follow-up / manual fix rate
…then your context quality is a leading indicator.
Bad input guarantees:
- Higher model bills
- Slower agents
- More “mysterious failures” that look like hallucinations
- More support tickets and manual intervention
Good input means:
- You can standardize on smaller, cheaper models
- You can run more agents in parallel within the same budget
- You can push more critical workflows into automation without fear
Scraping isn’t a “data engineering” footnote anymore.
It’s a core part of your AI unit economics.
What to Demand From a Scraper in the Agent Era
Whether you use rtrvr.ai or not, this is the checklist I’d use going forward:
-
Structured, semantic output
- Does the tool understand “this is a button / price / title / input”?
- Or does it just spit out cleaned HTML/Markdown?
-
Hierarchy preserved
- Can the LLM see how elements relate (card → title + price + link)?
- Or is everything flattened into text?
-
Noise aggressively filtered
- Are scripts, trackers, and layout cruft stripped before the model sees them?
- Or are you paying tokens for pixels and pixels for nothing?
-
LLM-friendly format
- Is the output directly usable as context for an agent?
- Or do you still have to write parsing glue on top?
-
Consistency across sites
- Does it survive real-world sites like Reddit, Amazon, app-style UIs?
- Or does it look great on landing pages and fall apart on anything dynamic?
If your current scraper fails most of these, you don’t have a model problem.
You have a context rot problem.
Why rtrvr.ai Won in This Micro-Benchmark
In my test:
-
Firecrawl:
- Failed Reddit
- Partial Amazon (29/48 products)
- Some data on ChatGPT.com
- Output: essentially a higher-quality dump
-
Parallel:
- Failed all three; no usable data from Amazon
- Output: often nothing to even try to reason over
-
rtrvr.ai:
- Full extraction on all three
- Structured accessibility/DOM tree
- Extra metadata (including inputs, SVGs, hierarchy)
- Output: optimized to be directly consumed by an LLM or an agent
Speed-wise, rtrvr.ai and Firecrawl were close.
The difference was in what I got back.
One gave me markdown-ish noise.
The other gave me a clean semantic tree I could plug into any agent stack.
Where This Fits in Your Stack
If you’re building:
- Research or enrichment agents
- E-commerce intelligence tools
- Lead gen / sales ops automation
- Monitoring agents for competitors or pricing
- Internal tools that sit on top of SaaS apps
…then your scraping layer is either:
- Quietly compounding context rot, or
- Quietly compounding leverage.
rtrvr.ai is designed to be the second.
You can:
- Use the scrape/extract APIs as a standalone service
- Pipe structured trees into your own LLM / agent framework
- Or just let rtrvr run the full browse → understand → extract loop for you
Watch the Full Side-By-Side
I recorded the entire comparison so you can see the behavior on:
- Reddit
r/AI_Agents - ChatGPT.com
- Amazon smart ring search
Video: rtrvr.ai Scrape API Demo - Outperforming Firecrawl and Parallel
🎥 In the video embedded above, watch how each tool behaves in real time.
Try the Scrape API Yourself
If you want to see how much context rot you can eliminate in your own stack, you don’t have to write a line of glue code first.
- Explore the Scrape API in our docs:
/docs/scrape-api - Play with real pages in the Scrape API Playground:
/docs/scrape-api#playground
Drop in the same kinds of URLs I used—Reddit threads, app-style UIs like chatgpt.com, Amazon product searches—and compare:
- How clean the structure is
- How many tokens you actually need
- How much simpler your agent prompts become
Once you see the structured tree, it’s very hard to go back to HTML dumps.
The Takeaway
Web scraping for AI agents isn’t about “getting data off the page” anymore.
It’s about:
Getting the right data in the right structure
into the smallest possible context
so the cheapest possible model
can deliver the highest possible quality.
Your model’s output quality is capped by your input quality.
If you’re still dumping raw HTML or Markdown into prompts, you’re paying large-model prices to babysit a broken scraping layer.
If you’re not using rtrvr for your scraping needs, you’re missing out.
