# Generic Scrapers

> Skill `web-scraping` on Faro. 6 sub-skills.

Reads and extracts content from web pages you already have a URL for. Give it a URL and get back clean text, a JSON object of specific fields, a list of all pages on the site, or rendered output from pages that need JavaScript or interaction. Use this when you know the page you want; use Web Search when you need to find pages first.

**Category:** Web & Search  
**Tags:** scraping, data-extraction, markdown, content-extraction, rag, proxy, serp, anti-bot, crawler, web  
**Use when:** You have a URL (or a site, or a search query) and need the real page content, specific fields off it, or what a search engine ranks, not a model guess.  
**Not for:** Platform-specific scrapers like LinkedIn, Amazon, or maps reviews (use Site Scrapers), open-web research to find URLs first (use Web Search), pages behind a personal login, or browser-automation scripting.  
**Returns:** information — Returns page content as Markdown, an extracted JSON object, a list of URLs, or parsed search results, depending on the job.

## How to run
Skills run through one gateway with your Faro token. Hand it an `intent` in plain language; Faro routes to the right sub-skill, runs it, and bills per call. Raw tools are internal plumbing and are not directly callable.

```
POST https://skill.askfaro.com/skills/web-scraping/run
Authorization: Bearer faro_<your_key>
Content-Type: application/json

{"intent":{"prompt":"Get all the text from this article"}}
```

Or from the CLI:

```bash
pip install askfaro-cli && askfaro auth login
askfaro run web-scraping "Get all the text from this article"
```

Full run reference: https://askfaro.com/llms/run.md — Agent recipe: https://askfaro.com/llms/skill.md

## Example requests

- Get all the text from this article
- Scrape the product name, price, and rating from this page
- Click the "Load more" button and get the additional results
- List all the URLs on this documentation site

## Sub-skills

### Read a page

Fetches one known URL and returns its main content as clean Markdown.

**Cost:** ~6.25 credits / page (up to 250) — A standard page is the usual case; a hard anti-bot page or a long multi-page PDF costs more.

**Use when:** You have a URL and want its readable body, an article, a docs page, or a PDF, without ads or navigation.

**Details:** https://askfaro.com/llms/skills/web-scraping/read_url.md

---

### Extract fields

Pulls specific structured fields off a known page.

**Cost:** ~6.25 credits / page (up to 250) — Priced per page read; a hard anti-bot page or a long page costs more.

**Use when:** You want named data points from a page (price, title, author, specs) as a clean object, not the whole article.

**Details:** https://askfaro.com/llms/skills/web-scraping/extract_fields.md

---

### Read interactive page

Reads a page that needs a click, scroll, typing, or late JavaScript first.

**Cost:** ~6.25 credits / page (up to 250) — Priced per page; an interactive scrape on a hard anti-bot site costs more.

**Use when:** The content only appears after a button click, scroll, form input, or a delay (a "load more", an accept gate, a late-rendering app).

**Details:** https://askfaro.com/llms/skills/web-scraping/interactive_scrape.md

---

### Map a site

Lists the URLs that exist on a site, without fetching their content.

**Cost:** 6.25 credits / call

**Use when:** You want to discover what pages a domain has (all docs, blog, or pricing URLs) before deciding what to scrape.

**Details:** https://askfaro.com/llms/skills/web-scraping/map_site.md

---

### Search results

Runs a search-engine query and returns the ranked results as structured data.

**Cost:** 1.875 credits / search

**Use when:** You want what Google, Bing, or another engine ranks for a query, as parsed results with titles, links, and snippets.

**Details:** https://askfaro.com/llms/skills/web-scraping/search_engine.md

---

### Unlock a blocked page

Fetches a single URL that a normal request gets blocked on, returning the raw page body.

**Cost:** 1.875 credits / request

**Use when:** A plain fetch returns a 403, a CAPTCHA, or empty/blocked content and you just need the raw body.

**Details:** https://askfaro.com/llms/skills/web-scraping/unlock.md

---

---
On the web: https://askfaro.com/search/web-scraping