Tools/generic-scrapers

Generic Scrapers

Active

Your agent's plain fetches get blocked or summarized away; this returns the real page as structured data.

4 tools

Turn any URL into clean, LLM-ready Markdown or extracted fields, map and crawl whole sites, and use an anti-bot proxy to unlock pages that block normal fetches and return parsed search-engine results. For ready-made scrapers targeting specific platforms, see Site Scrapers.

Web & Searchscrapingdata-extractionmarkdowncontent-extractionragproxyserpanti-botcrawlerweb

Tools (4)

Scrape

▶scrapeAndExtractFromUrl

Fetch a single URL and return its content as clean Markdown (or HTML, raw text, screenshot, or links). Handles JavaScript rendering, anti-bot, and proxy rotation automatically; the agent just supplies a URL. Ideal for reading articles, docs pages, PDFs, or any page an agent needs to reason about. Supports per-call options like `formats`, `onlyMainContent`, `waitFor` (for dynamic pages), `includeTags` / `excludeTags`, and `actions` (click, scroll, type) before extraction.

Usage-based · 6.25 credits per page

Example prompts

“Scrape https://example.com/article and return the markdown”
“Get the cleaned main content of this docs page as markdown”
“Fetch this URL and give me both markdown and a full-page screenshot”
“Read the PDF at this URL and convert it to markdown”

Parameters

bodyrequired

API Usage

curl -X POST "https://skill.askfaro.com/skills/generic-scrapers/run" \
  -H "Authorization: Bearer faro_<your_key>" \
  -H "Content-Type: application/json" \
  -d '{
  "intent": {
    "prompt": "Scrape https://example.com/article and return the markdown"
  }
}'

CLI Usage

askfaro describe generic-scrapers/scrapeAndExtractFromUrl

Install pip install askfaro-cli, then askfaro auth login.

Map

▶mapUrls

Given a starting URL, return up to N URLs from the same site without scraping their content, a fast and cheap way to discover the shape of a site before deciding what to scrape. Use it as a precursor to `scrapeAndExtractFromUrl` for crawling-style workflows, or to find all docs/blog/product URLs on a domain. Supports `search` (filter URLs by keyword), `limit`, and `includeSubdomains`.

6.25 credits/call ($0.00625) · 6.25 credits per call

Example prompts

“List up to 50 URLs under https://docs.example.com”
“Map this site and filter for URLs containing "pricing"”
“Find all blog post URLs on example.com”

Parameters

urlstringrequired

The base URL to start crawling from

limitintegeroptionaldefault: 5000

Maximum number of links to return

searchstringoptional

Search query to use for mapping. During the Alpha phase, the 'smart' part of the search functionality is limited to 1000 search results. However, if map finds more results, there is no limit applied.

timeoutintegeroptional

Timeout in milliseconds. There is no timeout by default.

sitemapOnlybooleanoptionaldefault: false

Only return links found in the website sitemap

ignoreSitemapbooleanoptionaldefault: true

Ignore the website sitemap when crawling.

includeSubdomainsbooleanoptionaldefault: true

Include subdomains of the website

API Usage

curl -X POST "https://skill.askfaro.com/skills/generic-scrapers/run" \
  -H "Authorization: Bearer faro_<your_key>" \
  -H "Content-Type: application/json" \
  -d '{
  "intent": {
    "prompt": "List up to 50 URLs under https://docs.example.com"
  }
}'

CLI Usage

askfaro describe generic-scrapers/mapUrls

Install pip install askfaro-cli, then askfaro auth login.

SERP Search

▶serp_search

Run a search engine query through an anti-bot proxy and return parsed JSON results. Pass the full search URL including `brd_json=1` (e.g. https://www.google.com/search?q=...&brd_json=1). Supports Google, Bing, Yandex, DuckDuckGo, Baidu; set the host accordingly. Optional `country` ISO code controls proxy exit location.

1.875 credits/call ($0.001875) · 1.875 credits per call

Example prompts

“Search Google and return the results as JSON”
“Run a search engine query through an anti-bot proxy and get parsed results”
“Get search results for a query without being blocked”

Parameters

urlstringrequired

Full search engine URL with query params. Append `brd_json=1` for parsed JSON results. Example: `https://www.google.com/search?q=site%3Areddit.com+best+espresso+machine&brd_json=1&hl=en&gl=us`. Supported engines include Google, Bing, Yandex, DuckDuckGo, and Baidu — set the host and any engine-specific params accordingly.

countrystringoptional

ISO 3166-1 alpha-2 country code for the proxy exit location (e.g. `us`, `gb`, `de`). Affects geo-targeted results.

API Usage

curl -X POST "https://skill.askfaro.com/skills/generic-scrapers/run" \
  -H "Authorization: Bearer faro_<your_key>" \
  -H "Content-Type: application/json" \
  -d '{
  "intent": {
    "prompt": "Search Google and return the results as JSON"
  }
}'

CLI Usage

askfaro describe generic-scrapers/serp_search

Install pip install askfaro-cli, then askfaro auth login.

Web Unlocker

▶unlock_url

Fetch a URL through Web Unlocker; bypasses anti-bot measures, solves CAPTCHAs, renders JavaScript. Use when a regular HTTP fetch returns 403 or blocked content. Returns the page body (HTML/JSON/etc.) as raw text. Optional `country` ISO code controls the proxy exit location.

1.875 credits/call ($0.001875) · 1.875 credits per call

Example prompts

“Fetch a page that keeps returning a 403 or blocked response”
“Load a JavaScript-heavy URL and return the rendered HTML”
“Bypass anti-bot protection and get the raw contents of a URL”

Parameters

urlstringrequired

Target URL to fetch through Bright Data's Web Unlocker proxy. Bypasses anti-bot protections, solves CAPTCHAs, and renders JavaScript automatically. Returns the page body as raw text (HTML/JSON/etc.) via HTTP GET.

countrystringoptional

ISO 3166-1 alpha-2 country code for the proxy exit location (e.g. `us`, `gb`, `de`). Affects which IPs the unlocker rotates through.

API Usage

curl -X POST "https://skill.askfaro.com/skills/generic-scrapers/run" \
  -H "Authorization: Bearer faro_<your_key>" \
  -H "Content-Type: application/json" \
  -d '{
  "intent": {
    "prompt": "Fetch a page that keeps returning a 403 or blocked response"
  }
}'

CLI Usage

askfaro describe generic-scrapers/unlock_url

Install pip install askfaro-cli, then askfaro auth login.

▶README

Web Scraping on Faro

Three ways to get data off the web, all behind one token and one balance. Pick by what the page is:

General pages: Firecrawl

scrapeAndExtractFromUrl turns any URL into clean Markdown, HTML, text, or structured fields (JS rendered, anti-bot handled). mapUrls discovers a site's URLs; crawlUrls, extractData, and searchAndScrape cover bulk and structured jobs.

Blocked or hard pages: Bright Data

unlock_url fetches pages that return 403 or bot challenges (CAPTCHA solving, JS rendering, residential IP rotation). serp_search returns parsed search-engine results as JSON.

Specific platforms: ready-to-run scrapers

50+ purpose-built scrapers for Instagram, TikTok, Facebook, LinkedIn, YouTube, Reddit, Google Maps/Search, Amazon, eBay, Booking, Airbnb, Zillow, Indeed, Glassdoor, Trustpilot, and more. Each returns clean structured items.

Pricing is per-tool and shown before you call. Failed requests are not charged. Each call is capped at $5 to prevent a surprise charge: a request predicted to cost more is declined with its estimate, so ask for fewer results per call and paginate with the continuation token to get the rest.