Web Crawl Tool

Last updated: Jan 2026

Overview

The Web Crawl tool retrieves content from specific URLs or crawls multiple pages from a website by following links. Ideal for analyzing competitors, gathering product information, or monitoring website content.

Powered by Tavily

Web crawling is powered by Tavily's crawl API, optimized for AI applications. No API key configuration required - it works out of the box.

How It Works

When you provide a URL in your prompt and enable the Web Crawl tool, the AI can visit that page and optionally follow links to gather more content.

1

Enable the Web Crawl Tool

Add the Web Crawl tool to your LLM step from the tool configuration panel.

2

Provide the Target URL

Include the URL you want to crawl in your prompt. Specify if you want to follow links or just analyze the single page.

3

AI Processes the Content

The AI receives the page content and uses it to complete your task.

Example Prompt
Visit https://example.com/products and extract information about
all their product offerings. Include prices, features, and availability.

Parameters

Configure these parameters in the tool settings panel to control how the crawler navigates and extracts content from websites.

URL Settings

ParameterDescriptionDefault
urlOverride the base URL to crawl. Leave empty to use the LLM-provided URL.-
instructionsInstructions for the crawler (what to look for, focus areas). Helps guide which pages are most relevant.-

Crawl Limits

Control how far and wide the crawler goes from the starting URL.

ParameterDescriptionDefault
max_depthMaximum depth to crawl from the starting URL (1-10). Depth 1 means only the starting page and its direct links.2
max_breadthMaximum number of pages per level (1-50). Controls how many links to follow at each depth level.10
limitMaximum total pages to crawl (1-100). Hard limit across all depth levels.10

Path Filters

Include or exclude specific paths to focus the crawl on relevant content.

ParameterDescriptionExample
select_pathsOnly crawl pages matching these paths/docs, /api
exclude_pathsSkip pages matching these paths/blog, /archive

Domain Filters

Control which domains the crawler can access.

ParameterDescriptionDefault
select_domainsOnly crawl within these domains-
exclude_domainsExclude these domains from crawling-
allow_externalFollow links to external domainsfalse

Content Options

Control how content is extracted from crawled pages.

ParameterDescriptionDefault
extract_depthDepth of content extraction per page. Basic extracts main content, Advanced includes more details.basic
formatOutput format: "markdown" preserves formatting, "text" returns plain text.markdown
include_imagesInclude images from crawled pagesfalse
include_faviconInclude favicon URLs for crawled pagesfalse
categoriesFilter pages by content categories-

Efficient Crawling

Use path and domain filters to focus your crawl on relevant sections. This improves speed, reduces costs, and produces more relevant results.

Crawl Types

The Web Crawl tool supports two crawling modes depending on your needs.

Single Page Crawl

Retrieve content from a single URL. Fast and efficient for analyzing specific pages like a product page, blog post, or documentation page.

"Analyze the content at https://example.com/pricing"

Multi-Page Crawl

Follow links from a starting URL to gather content from multiple related pages. Useful for comprehensive site analysis or documentation gathering.

"Crawl the documentation at https://docs.example.com and summarize the API reference"

Rate Limiting

Web crawling is rate-limited to prevent abuse. For large-scale crawling needs, break your task into multiple workflow runs or focus on specific sections of a site.

Use Cases

Common scenarios where Web Crawl excels:

Use CaseDescription
Competitor AnalysisAnalyze competitor websites for pricing, features, and positioning
Product ResearchGather product specifications, reviews, and availability from e-commerce sites
Documentation SynthesisCrawl technical documentation to create summaries or answer questions
Content MonitoringTrack changes on websites for news, pricing updates, or content changes
Data CollectionGather structured data from websites for analysis or reporting

Best Practices

Follow these guidelines for effective web crawling:

  • Provide specific URLs when you know the exact pages you need
  • Limit crawl scope to relevant sections of a site
  • Use single page crawl for focused analysis
  • Specify what information you want to extract from the pages
  • Consider using Content Extract for cleaner results on single pages

When to Use Web Crawl vs. Other Tools

ToolBest For
Web SearchFinding pages on a topic when you don't have specific URLs
Web CrawlGathering content from known URLs, following links across a site
Content ExtractGetting clean, structured content from a single specific page

Key Takeaways

  • Web Crawl visits URLs and optionally follows links
  • No configuration required - works automatically
  • Best for site analysis, competitor research, and documentation
  • Rate-limited to prevent abuse - scope crawls appropriately
  • Use Content Extract for cleaner single-page results