# Scrape Do AI integration on Definable

> Scrape.do is a web scraping API offering rotating residential, data-center, and mobile proxies with headless browser support and session management to bypass anti-bot protections (e.g., Cloudflare, Akamai) and extract data at scale in formats like JSON and HTML.

## What this connects

Scrape.do is a web scraping API offering rotating residential, data-center, and mobile proxies with headless browser support and session management to bypass anti-bot protections (e.g., Cloudflare, Akamai) and extract data at scale in formats like JSON and HTML.

Vendor: https://scrape.do

## Tools available

**16** tools available. First 12:

- `SCRAPE_DO_CANCEL_ASYNC_JOB` — Cancel Async Job — Tool to cancel an asynchronous scraping job. Use when you need to stop processing of pending tasks in a job. Completed tasks remain available.
- `SCRAPE_DO_CREATE_ASYNC_JOB` — Create Async Scraping Job — Tool to create an asynchronous scraping job with specified targets and options. Use when you need to scrape multiple URLs in parallel without waiting for results. Returns a job ID immediately for polling results later via the get job status action.
- `SCRAPE_DO_GET_ACCOUNT_INFO` — Get Account Information — Retrieves account information and usage statistics from Scrape.do. This action makes a GET request to the Scrape.do info endpoint to fetch: - Subscription status - Concurrent request limits and usage - Monthly request limits and remaining requests - Real-time usage statistics Rate limit: Maximum 10 requests per minute. Use remaining request counts to monitor credits proactively, as different scraping operations (e.g., rendered-page requests) consume varying credit amounts and exhaustion mid-run causes failures.
- `SCRAPE_DO_GET_AMAZON_OFFERS` — Get Amazon Product Offers — Get all seller offers for any Amazon product. Retrieves every seller listing including pricing, shipping costs, seller information, and Buy Box status in structured JSON format. Use when you need to compare prices across multiple sellers or find the best deal for a specific product.
- `SCRAPE_DO_GET_AMAZON_PRODUCT` — Get Amazon product details — Extract structured product data from Amazon product detail pages (PDP). Returns comprehensive product information including title, pricing, ratings, images, best seller rankings, and technical specifications in JSON format.
- `SCRAPE_DO_GET_AMAZON_RAW_HTML` — Get Amazon raw HTML — Tool to get raw HTML from any Amazon page with ZIP code geo-targeting. Use when you need complete unprocessed HTML source from Amazon URLs with location-based targeting. Ideal for scraping pages not covered by other structured endpoints.
- `SCRAPE_DO_GET_ASYNC_ACCOUNT_INFO` — Get Async API Account Information — Tool to get account information for the Async API including concurrency limits and usage statistics. Use when you need to check available concurrency slots, active jobs, or remaining credits for Async API operations.
- `SCRAPE_DO_GET_ASYNC_JOB` — Get Async Job Details — Tool to retrieve details and status of a specific asynchronous scraping job. Use when you need to check the progress, status, or results of a previously created async job. Returns job metadata including creation time, completion time, task counts, and detailed task list.
- `SCRAPE_DO_GET_ASYNC_TASK` — Get Async Task Result — Tool to retrieve the result of a specific task within an asynchronous job. Returns the scraped content for that particular URL. Use when you need to check the status and result of a previously submitted async scraping task.
- `SCRAPE_DO_GET_PAGE` — Scrape webpage using scrape.do — A tool to scrape web pages using scrape.do's API service. Makes a basic GET request to fetch webpage content while handling anti-bot protections and proxy rotation automatically. Does not execute JavaScript by default — pages requiring client-side rendering (SPAs, dynamically loaded content) will return incomplete HTML; use SCRAPE_DO_GET_RENDER_PAGE or set render=true for those cases.
- `SCRAPE_DO_LIST_ASYNC_JOBS` — List Asynchronous Scraping Jobs — Tool to list all asynchronous scraping jobs. Returns paginated list of jobs with their status and metadata. Use when you need to retrieve job history or monitor job statuses. Supports pagination with up to 100 jobs per page.
- `SCRAPE_DO_PROXY_MODE` — Use Scrape.do Proxy Mode — This tool implements the Proxy Mode functionality of scrape.do, which allows routing requests through their proxy server. It provides an alternative way to access web scraping capabilities by handling complex JavaScript-rendered pages, geolocation-based routing, device simulation, and built-in anti-bot and retry mechanisms.

## Auth

Auth schemes: `API_KEY`.

## How agents use Scrape Do

Inside a Definable workflow, Scrape Do is one of the tools the **Distributor specialist** can call. Example coordination patterns:

- **Researcher → Scrape Do** — the Researcher (GPT-5.5) pulls context from Scrape Do (records, threads, documents), synthesises findings, and briefs the rest of the team.
- **Writer → Distributor → Scrape Do** — the Writer (Claude Opus 4.7) drafts copy in brand voice, the Verifier passes it, then the Distributor writes the result into Scrape Do (create record, post message, draft email).
- **Designer / Engineer → Distributor → Scrape Do** — the Designer ships an asset or the Engineer ships a code change, the Distributor delivers it via Scrape Do (attach file, open PR comment, post status).

The Verifier checks every Scrape Do call. On rate limit, schema drift, or auth refresh it self-heals and retries — the workflow completes without manual intervention.

## Categories

- ai web scraping — https://definable.ai/apps/category/ai-web-scraping/
- developer tools — https://definable.ai/apps/category/developer-tools/

## Related

- HTML page: https://definable.ai/apps/scrape_do/
- Same category (ai web scraping): https://definable.ai/apps/category/ai-web-scraping/
- All integrations: https://definable.ai/apps/
- Workflow (multi-agent loop): https://definable.ai/workflow/
- Apps llms.txt index: https://definable.ai/llms-apps.txt
