Content Extract Tool | ORCFLO Documentation

Overview

The Content Extract tool pulls clean, structured content from web pages, removing navigation, ads, and other clutter to return just the main content. It's optimized for AI processing, making it ideal for article analysis, summarization, and content parsing.

Content extraction is powered by Tavily's extract API, which uses intelligent parsing to identify main content. No configuration required - it works out of the box.

How It Works

When you provide a URL in your prompt and enable the Content Extract tool, the AI fetches and parses the page to extract the main content in a clean format.

Enable the Content Extract Tool

Add the Content Extract tool to your LLM step from the tool configuration panel.

Provide the URL

Include the URL of the page you want to extract content from in your prompt.

AI Receives Clean Content

The AI receives the extracted content - clean text without HTML, scripts, or ads - and uses it to complete your task.

Example Prompt

Extract the main article content from this URL:
https://blog.example.com/ai-trends-2026

Summarize the key points and identify any statistics mentioned.

Parameters

Configure these parameters in the tool settings panel to customize content extraction.

Parameter	Description	Default
`urls`	Override the URL(s) to extract from. Comma-separated for multiple URLs. Leave empty to use the LLM-provided URLs.	-
`extract_depth`	Basic extracts main content. Advanced includes more details and context.	basic
`format`	Output format: "markdown" preserves formatting, "text" returns plain text.	markdown
`include_images`	Include images from the extracted content with URLs and alt text.	false
`include_favicon`	Include favicon URL for the source website.	false

Multiple URLs

You can extract content from multiple URLs in a single request by listing them comma-separated in the urls parameter, or by asking the AI to extract from multiple pages.

Features

Content Extract provides several advantages over raw web crawling:

Feature	Description
Clean Text Output	Returns readable text without HTML tags, scripts, or styling
Article Focused	Automatically identifies and extracts the main content area, ignoring navigation and sidebars
Metadata Included	Returns title, author, and publish date when available on the page
Image Handling	Includes image URLs and alt text from the main content
Consistent Format	Output is structured consistently regardless of the source site's design

Example Output

json

{
  "title": "AI Trends to Watch in 2026",
  "author": "Jane Smith",
  "published_date": "2026-01-05",
  "content": "Artificial intelligence continues to evolve rapidly, with several key trends emerging for 2026. First, multimodal AI systems that combine text, image, and audio understanding are becoming mainstream...",
  "images": [
    {
      "url": "https://blog.example.com/images/ai-trends.jpg",
      "alt": "AI technology illustration"
    }
  ]
}

Use Cases

Content Extract works best for these scenarios:

News Article Summarization: Extract and summarize news articles from various sources
Blog Content Analysis: Analyze blog posts for key points, sentiment, or themes
Documentation Parsing: Extract content from documentation pages for reference
Product Page Extraction: Pull product details, descriptions, and specifications
Research Paper Analysis: Extract text from research papers or academic articles
Recipe Extraction: Pull recipes from food blogs, stripping ads and navigation

Example Workflow: Content Research

Search for Relevant Articles

Use Web Search to find articles on your topic

Extract Full Content

Use Content Extract to get clean content from the best results

Analyze and Synthesize

Have the AI synthesize findings across all extracted articles

Deliver Report

Output a comprehensive report with citations

Best Practices

Follow these guidelines for effective content extraction:

Provide the exact URL of the content you want to extract
Use Content Extract instead of Web Crawl when you need clean, focused content
Ask the AI to cite the source when using extracted content
Specify what aspects of the content you're interested in
Combine with Web Search for comprehensive research workflows

Content Extract vs. Web Crawl

Content Extract	Web Crawl
Single page only	Can follow links to multiple pages
Returns clean, structured content	Returns raw page content
Removes ads, navigation, clutter	Includes all page content
Best for articles, blog posts	Best for site-wide analysis

Key Takeaways

Content Extract provides clean, readable content from any URL
Automatically removes ads, navigation, and clutter
No configuration required - works automatically
Best for articles, blog posts, and documentation
Combine with Web Search for powerful research workflows