Connect Diffbot to Definable AI

Diffbot provides AI-powered tools to extract and structure data from web pages, transforming unstructured web content into structured, linked data.

About Diffbot

Diffbot is a productivity tool. Connect it to Definable AI with one-click OAuth2 — no API keys or custom code required.

What you can automate with Diffbot

Use Definable AI's agent platform to trigger workflows from Diffbot, process results with 50+ AI models, and sync data across 900+ connected apps.

Tools & Actions (41 available)

  • Combine Entity Profiles: Combine multiple entity profiles into a unified view using the Diffbot Knowledge Graph. Returns enhanced person or organization data by matching on identifying attributes like name, email, employer, or URL. Use this to enrich partial entity data, merge duplicate profiles, or verify entity identity.
  • Create Bulk Enhance Job: Tool to submit a bulk enhance job to enrich multiple entities asynchronously. Use when you need to process many Person or Organization records in batch. The API accepts entity descriptions and returns enriched data from the Diffbot Knowledge Graph.
  • Create Bulk Extract Job: Tool to submit a bulk extract job to process multiple URLs with Extract APIs. Use when you need to process many URLs asynchronously using any Extract API. The job will process URLs in the background and provide downloadable results.
  • Create or Update Custom API: Tool to create or update the parameters and ruleset of a Custom API. Use this when you need to define custom extraction rules for specific websites that require tailored parsing logic beyond standard Diffbot APIs. Allows defining URL patterns, CSS selectors, extraction rules, and preprocessing filters to extract structured data from websites with unique layouts.
  • Delete Custom API: Tool to delete custom API definitions for a given URL pattern. Removes custom extraction rules from your account. Use when you need to remove previously configured custom APIs.
  • Delete KG Enhance Bulkjob: Tool to delete an Enhance Bulkjob. Removes the bulk job and its results from the system. Use when cleaning up completed or failed jobs.
  • Diffbot Analyze: Automatically analyzes a web page to determine its type and extract structured data. The Analyze API intelligently classifies pages into types (article, product, discussion, image, video, organization, etc.) and extracts relevant structured data. Use this when you need to process URLs of unknown type or want automatic extraction without specifying the page type in advance.
  • Diffbot Extract Job: Tool to extract structured job posting data from job listing pages. Returns job title, company, location, salary, requirements, skills, and other job-related information. Use when you need to parse and structure data from job postings.
  • Diffbot Extract List: Tool to extract structured data from list-style pages like news indexes, product listings, and directory pages. Returns an array of items with their titles, links, and descriptions. Use when you need to extract multiple items from a page organized as a list or index.
  • Diffbot Get Event: Tool to extract event details from web pages. Use when you need structured event data such as venue, date, and description.
  • Diffbot Get Image: Tool to extract detailed information about images, including dimensions and recognition data. Use after confirming the image URL is publicly accessible.
  • Diffbot Get Product: Tool to extract product information such as specifications, prices, availability, and reviews. Use when you need structured product data including specs, pricing, and reviews.
  • Diffbot Knowledge Graph Search: Search the Diffbot Knowledge Graph using DQL (Diffbot Query Language). Query billions of entities including organizations, people, articles, products, and more. Use structured queries to filter by type, fields, and relationships.
  • Download Bulk Job Results: Tool to download results of a bulk enhance job with filtering options via POST request. Use this to retrieve processed results from a completed or running bulk job. Supports multiple export formats (json, jsonl, csv, xls, xlsx) and various filtering options to customize the output. HTTP 200 indicates results are ready, HTTP 201 means the job is still executing.
  • Download KG Enhance Bulk Job Data: Tool to download results of a completed KG Enhance bulk job with filtering via POST request. Use when you need to retrieve bulk job results with specific filters, field selections, or in different export formats. Supports pagination and various filtering options.
  • Enhance Entity with Knowledge Graph: Enrich a person or organization with comprehensive data from the Diffbot Knowledge Graph. Provide identifiers like name, email, employer, or URL and receive detailed entity information including employment history, education, location, skills, and more. Use when you need to gather all publicly available knowledge about a specific person or organization from billions of web pages.
  • Get Article Data: Tool to extract information from articles, including authors, publication dates, and images. Use when you need structured metadata from a web article URL.
  • Get Bulk Enhance Data: Tool to download the results of a completed Enhance Bulkjob. Returns enriched entity records (organizations, people, etc.) with match scores and metadata. Use after creating and running an enhance bulk job to retrieve the enriched data.
  • Get Bulk Job Data: Tool to download extracted results from a completed bulk job. Use after a bulk job has finished processing to retrieve the data. Supports JSON and CSV formats.
  • Get Bulk Job Results: Tool to download the results of a completed Enhance Bulkjob. Returns enriched records from the bulk job. Use after a bulk enhance job has completed processing.
  • Get Bulk Job Status: Tool to poll the status of a specific Diffbot Knowledge Graph Enhance bulk job. Use when you need to check the progress, completion status, or details of a bulk enhancement job.
  • Get Bulk Single Result: Tool to download the result of a single job within a Diffbot bulk enhance job. Returns enriched entity data for a specific input record by its index. Use after a bulk enhance job has completed to retrieve individual results without downloading the entire dataset.
  • Get Crawl Data: Download extracted results from a completed crawl job. Returns all structured data extracted during crawl processing (articles, products, etc.). Use after a crawl job has completed to retrieve the collected data.
  • Get Diffbot Account Details: Retrieves comprehensive Diffbot account information including subscription plan details, credit balance, usage history, and account status. Returns account holder name, email, current plan, available credits, and daily usage statistics for the past 31 days. Use this to check your account's credit balance, monitor API usage patterns, verify account status, or retrieve account metadata.
  • Get Discussion Thread: Extract structured discussion threads from web pages including forums, comment sections, product reviews, Reddit discussions, and blog comments. Returns posts with author info, timestamps, content, and hierarchical relationships. Useful for analyzing conversations, gathering feedback, or monitoring discussions. Supported platforms: Native comment systems, Disqus, Facebook Comments, Reddit, forum software, and more. Use this when you need to: - Extract all comments/posts from a discussion thread - Analyze user feedback or reviews - Monitor forum discussions or social media threads - Gather structured conversation data with metadata
  • Get KG Coverage Report by ID: Download Knowledge Graph coverage report by report ID. Returns detailed CSV coverage statistics showing field presence across query results. Use this after generating a coverage report from a DQL query to retrieve the statistical breakdown of field coverage.
  • Get Knowledge Graph Coverage Report: Tool to get a Knowledge Graph coverage report by report ID. Returns detailed coverage statistics for a DQL query in CSV format. Use this when you need to retrieve a previously generated coverage report to analyze query performance or data coverage.
  • Get Video Data: Tool to extract information from videos, including titles, descriptions, and embedded HTML. Use when you need structured video metadata from any web page.
  • List Bulk Jobs: Tool to list all Bulk jobs associated with a specific token. Use after authenticating to retrieve statuses of all jobs for the account.
  • List Bulk Jobs Status For Token: Tool to get the status of all bulk enhance jobs for a token. Returns list of all bulk jobs associated with your API token. Use when you need to monitor or retrieve the status of multiple bulk jobs at once.
  • List Custom APIs: Tool to retrieve all Custom APIs and their extraction rules currently defined on your Diffbot token. Use when you need to list, review, or audit custom API configurations for your account.
  • List KG Bulk Jobs: Tool to list all Knowledge Graph bulk enhance jobs for a token. Returns status, progress, and metadata for all bulk jobs associated with your API token. Use after submitting bulk enhance jobs to track their progress.
  • Manage Crawl Job: Manages Diffbot crawl jobs: pause, restart, delete, or view status. Returns list of all active crawl jobs when called without parameters. Use 'name' parameter with action flags (pause=1, restart=1, delete=1) to control specific jobs.
  • Resolve Lost ID: Tool to resolve lost IDs in the Knowledge Graph. Use when you need to map a lost identifier to its canonical counterpart for data consistency.
  • Search Crawl Job Data: Tool to query crawl job collections using DQL (Diffbot Query Language). Use when you need to search extracted data from completed crawl or bulk jobs by collection name.
  • Search Crawl Job Data: Tool to search extracted content from Diffbot Crawl or Bulk jobs using query operators. Use when you need to find specific articles, products, or other extracted data within a collection.
  • Start Bulk Job: Tool to start a Bulk Extract job. Use when processing large numbers of URLs asynchronously. The Diffbot Bulk API uses GET requests with query parameters to create jobs.
  • Start Crawl Job: Initiates a Diffbot crawl job that spiders a website starting from seed URLs and processes discovered pages with a specified Extract API. The crawler follows links within the domain, collects structured data (articles, products, etc.), and stores results for download. Use this to systematically extract data from entire websites or sections. Requires Diffbot Plus plan or higher.
  • Stop Bulk Job: Tool to pause (stop) a running Bulk job. Pausing halts further processing of URLs while preserving existing progress. To resume, use the appropriate resume action. Specify the exact job name (case-sensitive) as provided when the job was created.
  • Stop KG Bulk Job By ID: Tool to stop an active Knowledge Graph Enhance bulk job by its ID. Halts processing of a running KG bulk job immediately. Use when you need to stop a specific KG bulk job using its bulkjobId.
  • Stop KG Enhance Bulk Job: Tool to stop an active KG Enhance bulk job. Halts processing of a running bulk job immediately. Use when you need to terminate a KG (Knowledge Graph) Enhance bulk job that is currently in progress.

How to connect Diffbot

  1. Sign in to Definable AI and go to Apps
  2. Search for Diffbot and click Connect
  3. Authorize via OAuth2 — takes under 30 seconds
  4. Use Diffbot actions in your AI agents and workflows