What is MCP web scraping?

MCP web scraping uses the Model Context Protocol -- an open standard from Anthropic -- to let AI assistants like Claude connect to scraping tools over JSON-RPC. Instead of writing REST integrations, you register an MCP server and Claude discovers and invokes its scraping tools the same way it uses built-in capabilities.

What MCP web scrapers are available in 2026?

Several MCP servers provide scraping capabilities, but CrawlForge leads with 20 tools -- about 4x more than alternatives. Other servers focus on narrower use cases (basic scraping, specific platforms), while CrawlForge covers fetching, extraction, search, deep research, stealth, and change tracking in one server.

How does Claude discover and call CrawlForge tools?

When configured in Claude Code or Claude Desktop, the MCP server registers its tools with name, description, and input schema. Claude then calls tools through `tools/call` messages with JSON-RPC, passing arguments and receiving structured responses -- no custom client code required.

How do I optimize credits with the 20 CrawlForge tools?

Always start with the cheapest tool that works: fetch_url and extract_text are 1 credit, scrape_structured and extract_content are 2, advanced tools are 3-5, and deep_research is 10. Cache results, batch URLs, and chain operations rather than calling deep_research when fetch_url would suffice.

How many credits does the free tier give me?

CrawlForge's free tier includes 1,000 one-time credits with access to all 20 tools and no credit card required. That covers roughly 1,000 basic fetches, 500 structured scrapes, or 100 deep_research queries -- enough to build and validate a full MCP scraping workflow.

The Complete Guide to MCP Web Scraping: Everything Developers Need to Know

The Model Context Protocol (MCP) has fundamentally changed how AI assistants interact with the web. This comprehensive guide covers everything developers need to know about MCP web scraping - from foundational concepts to advanced techniques.

Part 1: Understanding MCP

What is the Model Context Protocol?

MCP (Model Context Protocol) is an open standard developed by Anthropic that allows AI assistants like Claude to connect to external tools and data sources. Think of it as a universal adapter that lets AI models use specialized tools.

┌─────────────┐      ┌───────────────┐      ┌──────────────┐
│   Claude    │ ←──→ │  MCP Server   │ ←──→ │  External    │
│  (AI Model) │      │ (CrawlForge)  │      │  Resources   │
└─────────────┘      └───────────────┘      └──────────────┘
                            ↑
                     MCP Protocol
                  (JSON-RPC over stdio)

Why MCP Matters for Web Scraping

Before MCP, AI assistants couldn't reliably access web data:

Approach	Problems
Training data	Outdated, knowledge cutoff
RAG (Retrieval)	Limited to indexed documents
Function calling	Requires custom implementation
Browser plugins	Inconsistent, security concerns

MCP solves these by providing:

Standardized interface - One protocol for all tools
Real-time data - Fresh information from any source
Tool composability - Combine multiple tools seamlessly
Security model - Controlled access to external resources

How MCP Works

MCP uses a client-server architecture with JSON-RPC:

1. Server Discovery

Json

{
  "mcpServers": {
    "crawlforge": {
      "command": "npx",
      "args": ["crawlforge-mcp-server"]
    }
  }
}

2. Tool Registration

Json

{
  "tools": [
    {
      "name": "fetch_url",
      "description": "Fetch content from a URL",
      "inputSchema": {
        "type": "object",
        "properties": {
          "url": { "type": "string", "format": "uri" }
        },
        "required": ["url"]
      }
    }
  ]
}

3. Tool Invocation

Json

{
  "method": "tools/call",
  "params": {
    "name": "fetch_url",
    "arguments": {
      "url": "https://example.com"
    }
  }
}

4. Response

Json

{
  "result": {
    "content": "<html>...",
    "status": 200,
    "headers": {...}
  }
}

Part 2: The MCP Web Scraping Ecosystem

MCP Scraping Servers

Several MCP servers provide web scraping capabilities:

Server	Tools	Focus
CrawlForge	20	Comprehensive scraping, research, stealth
Firecrawl	~5	Basic scraping and crawling
Browser MCP	~3	Browser automation
Fetch MCP	1	Simple HTTP requests

Why CrawlForge Leads

CrawlForge was built specifically for MCP with the widest tool coverage:

CrawlForge: ████████████████████ 20 tools
Firecrawl:  █████ 5 tools
Browser:    ███ 3 tools
Fetch:      █ 1 tool

Part 3: CrawlForge's 20 Tools Explained

Basic Scraping (1-2 credits)

1. fetch_url (1 credit)

The foundation of web scraping - fetches raw HTML from any URL.

Typescript

// Usage:
"Fetch https://example.com"

// Returns:
{
  "html": "<html>...",
  "status": 200,
  "headers": {...},
  "timing": { "total": 523 }
}

When to use: Starting point for any scraping task. Always try this first.

2. extract_text (1 credit)

Extracts clean text content, removing HTML tags, scripts, and styles.

Typescript

// Usage:
"Extract text from https://example.com/article"

// Returns:
{
  "text": "Article headline\n\nFirst paragraph...",
  "wordCount": 1247,
  "readingTime": 5
}

When to use: Blog posts, articles, documentation where you need readable text.

3. extract_links (1 credit)

Discovers all links on a page with optional filtering.

Typescript

// Usage:
"Extract all links from https://example.com"

// With filtering:
{
  "url": "https://example.com",
  "filter_external": true  // Internal links only
}

// Returns:
{
  "links": [
    { "href": "/about", "text": "About Us" },
    { "href": "/products", "text": "Products" }
  ],
  "total": 45
}

When to use: Site exploration, finding pages to scrape, building sitemaps.

4. extract_metadata (1 credit)

Pulls SEO metadata: title, description, Open Graph, JSON-LD.

Typescript

// Usage:
"Get metadata from https://example.com"

// Returns:
{
  "title": "Example Site - Homepage",
  "description": "Welcome to Example...",
  "openGraph": {
    "title": "Example Site",
    "image": "https://example.com/og.png"
  },
  "jsonLd": [...]
}

When to use: SEO analysis, content previews, structured data extraction.

Structured Extraction (2-3 credits)

5. scrape_structured (2 credits)

Extracts specific data using CSS selectors.

Typescript

// Usage:
{
  "url": "https://example.com/products",
  "selectors": {
    "title": "h1.product-title",
    "price": "span.price",
    "description": ".product-description"
  }
}

// Returns:
{
  "data": {
    "title": "Product Name",
    "price": "$99.99",
    "description": "Product description..."
  }
}

When to use: E-commerce scraping, structured data, known page layouts.

6. extract_content (2 credits)

Intelligent article extraction (like Readability).

Typescript

// Usage:
"Extract the main content from https://blog.example.com/post"

// Returns:
{
  "title": "Blog Post Title",
  "author": "John Smith",
  "publishedDate": "2026-01-15",
  "content": "Clean article text...",
  "images": ["..."],
  "readingTime": 7
}

When to use: News articles, blog posts, editorial content.

7. map_site (2 credits)

Discovers site structure and generates sitemaps.

Typescript

// Usage:
{
  "url": "https://example.com",
  "max_urls": 1000,
  "include_sitemap": true
}

// Returns:
{
  "pages": [
    { "url": "/", "title": "Home", "depth": 0 },
    { "url": "/about", "title": "About", "depth": 1 },
    ...
  ],
  "structure": {
    "/": ["/about", "/products", "/blog"],
    "/products": ["/products/1", "/products/2"]
  }
}

When to use: Site audits, crawl planning, content discovery.

8. analyze_content (3 credits)

NLP analysis: language, sentiment, topics, entities.

Typescript

// Usage:
"Analyze this content: [text]"

// Returns:
{
  "language": "en",
  "sentiment": { "score": 0.7, "label": "positive" },
  "topics": ["technology", "AI", "automation"],
  "entities": [
    { "text": "OpenAI", "type": "organization" },
    { "text": "GPT-4", "type": "product" }
  ],
  "readability": { "grade": 12, "score": 45 }
}

When to use: Content analysis, sentiment tracking, topic extraction.

Advanced Scraping (4-5 credits)

9. process_document (2 credits)

Handles PDFs and documents.

Typescript

// Usage:
{
  "source": "https://example.com/report.pdf",
  "sourceType": "pdf_url"
}

// Returns:
{
  "text": "Extracted PDF text...",
  "pages": 15,
  "metadata": {
    "author": "...",
    "created": "..."
  }
}

When to use: Research papers, reports, documentation PDFs.

10. summarize_content (4 credits)

AI-powered summarization.

Typescript

// Usage:
"Summarize this article: [long text]"

// Returns:
{
  "summary": "Concise summary...",
  "keyPoints": [
    "Point 1",
    "Point 2",
    "Point 3"
  ],
  "wordReduction": "85%"
}

When to use: Long documents, research synthesis, content digests.

11. crawl_deep (4 credits)

Multi-page crawling with configurable depth.

Typescript

// Usage:
{
  "url": "https://example.com",
  "max_depth": 3,
  "max_pages": 100,
  "include_patterns": ["/blog/*"],
  "exclude_patterns": ["/admin/*"]
}

// Returns:
{
  "pages": [...],  // All crawled pages
  "stats": {
    "total": 87,
    "successful": 85,
    "failed": 2
  }
}

When to use: Full site scraping, content aggregation, archiving.

12. batch_scrape (5 credits)

Parallel scraping of multiple URLs.

Typescript

// Usage:
{
  "urls": [
    "https://example1.com",
    "https://example2.com",
    // ... up to 50 URLs
  ],
  "maxConcurrency": 10
}

// Returns:
{
  "results": [
    { "url": "...", "success": true, "data": {...} },
    ...
  ],
  "stats": { "successful": 48, "failed": 2 }
}

When to use: Multiple known URLs, competitor monitoring, price tracking.

13. scrape_with_actions (5 credits)

Browser automation with actions.

Typescript

// Usage:
{
  "url": "https://example.com/app",
  "actions": [
    { "type": "wait", "selector": ".content" },
    { "type": "click", "selector": "#load-more" },
    { "type": "wait", "timeout": 2000 },
    { "type": "scroll", "selector": "body" },
    { "type": "screenshot" }
  ]
}

// Returns:
{
  "finalContent": "...",
  "screenshots": ["base64..."],
  "actionsExecuted": 5
}

When to use: SPAs, infinite scroll, dynamic content, login required.

14. search_web (5 credits)

Google search integration.

Typescript

// Usage:
{
  "query": "web scraping best practices 2026",
  "limit": 10,
  "site": "github.com"  // Optional site filter
}

// Returns:
{
  "results": [
    {
      "title": "...",
      "url": "...",
      "snippet": "..."
    },
    ...
  ]
}

When to use: Discovery, finding sources, research starting point.

Specialized Tools (3-10 credits)

15. stealth_mode (5 credits)

Anti-detection bypass (detailed in Stealth Mode Guide).

Typescript

// Usage:
{
  "operation": "create_context",
  "stealthConfig": {
    "level": "advanced",
    "hideWebDriver": true,
    "randomizeFingerprint": true,
    "simulateHumanBehavior": true
  }
}

When to use: Protected sites, Cloudflare bypass, anti-bot evasion.

16. track_changes (3 credits)

Content monitoring and change detection.

Typescript

// Usage:
{
  "url": "https://example.com/pricing",
  "operation": "create_baseline",
  "monitoringOptions": {
    "interval": 86400000,  // Daily
    "notificationThreshold": "moderate"
  }
}

// Returns (on change):
{
  "changes": [
    {
      "type": "text_change",
      "path": ".pricing-tier-1",
      "before": "$19/mo",
      "after": "$29/mo",
      "significance": "major"
    }
  ]
}

When to use: Price monitoring, competitor tracking, content updates.

17. localization (2 credits)

Geo-targeted scraping.

Typescript

// Usage:
{
  "operation": "configure_country",
  "countryCode": "GB",
  "language": "en-GB"
}

// Then scrape to get UK-specific content/pricing

When to use: Regional pricing, localized content, geo-restricted data.

18. extract_structured (3 credits)

LLM-powered schema-driven extraction with CSS selector fallback.

Typescript

// Usage:
{
  "url": "https://example.com/product/123",
  "schema": {
    "type": "object",
    "properties": {
      "title": { "type": "string" },
      "price": { "type": "number" }
    },
    "required": ["title"]
  },
  "prompt": "Extract the product name and price"
}

When to use: When you want typed output matching a schema without writing selectors.

19. generate_llms_txt (5 credits)

Analyze a site and emit standard-compliant llms.txt and llms-full.txt files.

Typescript

// Usage:
{
  "url": "https://example.com",
  "format": "both",
  "complianceLevel": "standard",
  "outputOptions": {
    "organizationName": "Example Inc.",
    "contactEmail": "ai@example.com"
  }
}

When to use: Publishing AI interaction guidelines for your website.

20. deep_research (10 credits)

Comprehensive multi-source research (detailed in Deep Research Guide).

Typescript

// Usage:
{
  "topic": "quantum computing commercialization",
  "maxUrls": 50,
  "enableSourceVerification": true,
  "enableConflictDetection": true
}

// Returns:
{
  "synthesis": "Comprehensive analysis...",
  "sources": [...],
  "conflicts": [...],
  "citations": [...]
}

When to use: Research projects, due diligence, market analysis.

Part 4: Integration Guide

Claude Code Setup

Bash

# 1. Install CrawlForge MCP server
npm install -g crawlforge-mcp-server

# 2. Run setup wizard
npx crawlforge-setup

# 3. Add to Claude Code
claude
> /mcp add crawlforge npx crawlforge-mcp-server

# 4. Verify
> /mcp list
# Should show: crawlforge (20 tools)

Claude Desktop Setup

Edit your Claude Desktop config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

Json

{
  "mcpServers": {
    "crawlforge": {
      "command": "npx",
      "args": ["crawlforge-mcp-server"],
      "env": {
        "CRAWLFORGE_API_KEY": "cf_live_your_key_here"
      }
    }
  }
}

Custom Application Integration

Typescript

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";

const transport = new StdioClientTransport({
  command: "npx",
  args: ["crawlforge-mcp-server"],
  env: {
    CRAWLFORGE_API_KEY: process.env.CRAWLFORGE_API_KEY
  }
});

const client = new Client({
  name: "my-app",
  version: "1.0.0"
});

await client.connect(transport);

// List available tools
const tools = await client.listTools();
console.log(`Available tools: ${tools.tools.length}`);

// Call a tool
const result = await client.callTool({
  name: "fetch_url",
  arguments: {
    url: "https://example.com"
  }
});

Part 5: Best Practices

Credit Optimization

Goal	Expensive	Efficient
Check if page exists	deep_research (10)	fetch_url (1)
Get article text	scrape_with_actions (5)	extract_content (2)
Find competitor URLs	search_web × 10 (50)	extract_links (1)
Scrape 20 product pages	fetch_url × 20 (20)	batch_scrape (5)

Error Handling

Typescript

// Always handle failures gracefully
try {
  const result = await fetchUrl(url);
  if (result.status >= 400) {
    // Try with stealth mode
    return await stealthMode(url);
  }
  return result;
} catch (error) {
  // Log and retry with exponential backoff
  await sleep(retryDelay * attempt);
  return retry(url, attempt + 1);
}

Rate Limiting

Respect target sites:

Typescript

// Good: Reasonable delays
for (const url of urls) {
  await scrape(url);
  await sleep(1000 + Math.random() * 2000);  // 1-3s delay
}

// Better: Use batch_scrape with built-in rate limiting
await batchScrape(urls, { delayBetweenRequests: 1500 });

Caching

Don't scrape the same URL twice:

Typescript

const cache = new Map<string, ScrapedContent>();

async function smartScrape(url: string) {
  if (cache.has(url)) {
    return cache.get(url);
  }

  const result = await fetchUrl(url);
  cache.set(url, result);
  return result;
}

Part 6: The Future of MCP Scraping

Emerging Trends

AI-Native Extraction - LLMs directly parsing unstructured HTML
Self-Healing Scrapers - AI adapts to site changes automatically
Semantic Search - Natural language queries across scraped data
Cross-Site Analysis - AI connecting information across sources

CrawlForge Roadmap

Coming in 2026:

Real-time monitoring - Instant change notifications
AI schema generation - Automatic extraction templates
Cross-tool workflows - Chain tools intelligently
Enhanced privacy - Zero-knowledge scraping options

Getting Started

Ready to start MCP web scraping? Here's your path:

Free Tier (Perfect for Getting Started)

1,000 one-time trial credits
All 20 tools available
No credit card required

Bash

# Quick start
npm install -g crawlforge-mcp-server
npx crawlforge-setup
# Visit: https://crawlforge.dev/signup

What You Can Do with 1,000 Credits

Use Case	Tools	Credits	Monthly Capacity
Basic scraping	fetch_url	1	1,000 pages
Article extraction	extract_content	2	500 articles
Site mapping	map_site	2	500 sites
Batch jobs	batch_scrape	5	200 batches (10K URLs)
Research projects	deep_research	10	100 topics

Summary

MCP has revolutionized web scraping for AI applications. Key takeaways:

MCP is the standard - All major AI assistants support it
CrawlForge leads with 20 tools - 4x more than alternatives
Start simple - Use fetch_url (1 credit) before advanced tools
Combine tools - Chain operations for powerful workflows
Be ethical - Respect robots.txt and rate limits

Related Resources:

Get Started Free | View Pricing | Read Docs

Part 1: Understanding MCP

What is the Model Context Protocol?

┌─────────────┐      ┌───────────────┐      ┌──────────────┐
│   Claude    │ ←──→ │  MCP Server   │ ←──→ │  External    │
│  (AI Model) │      │ (CrawlForge)  │      │  Resources   │
└─────────────┘      └───────────────┘      └──────────────┘
                            ↑
                     MCP Protocol
                  (JSON-RPC over stdio)

Why MCP Matters for Web Scraping

Before MCP, AI assistants couldn't reliably access web data:

Approach	Problems
Training data	Outdated, knowledge cutoff
RAG (Retrieval)	Limited to indexed documents
Function calling	Requires custom implementation
Browser plugins	Inconsistent, security concerns

MCP solves these by providing:

Standardized interface - One protocol for all tools
Real-time data - Fresh information from any source
Tool composability - Combine multiple tools seamlessly
Security model - Controlled access to external resources

How MCP Works

MCP uses a client-server architecture with JSON-RPC:

1. Server Discovery

Json

{
  "mcpServers": {
    "crawlforge": {
      "command": "npx",
      "args": ["crawlforge-mcp-server"]
    }
  }
}

2. Tool Registration

Json

{
  "tools": [
    {
      "name": "fetch_url",
      "description": "Fetch content from a URL",
      "inputSchema": {
        "type": "object",
        "properties": {
          "url": { "type": "string", "format": "uri" }
        },
        "required": ["url"]
      }
    }
  ]
}

3. Tool Invocation

Json

{
  "method": "tools/call",
  "params": {
    "name": "fetch_url",
    "arguments": {
      "url": "https://example.com"
    }
  }
}

4. Response

Json

{
  "result": {
    "content": "<html>...",
    "status": 200,
    "headers": {...}
  }
}

Part 2: The MCP Web Scraping Ecosystem

MCP Scraping Servers

Several MCP servers provide web scraping capabilities:

Server	Tools	Focus
CrawlForge	20	Comprehensive scraping, research, stealth
Firecrawl	~5	Basic scraping and crawling
Browser MCP	~3	Browser automation
Fetch MCP	1	Simple HTTP requests

Why CrawlForge Leads

CrawlForge was built specifically for MCP with the widest tool coverage:

CrawlForge: ████████████████████ 20 tools
Firecrawl:  █████ 5 tools
Browser:    ███ 3 tools
Fetch:      █ 1 tool

Part 3: CrawlForge's 20 Tools Explained

Basic Scraping (1-2 credits)

1. fetch_url (1 credit)

The foundation of web scraping - fetches raw HTML from any URL.

Typescript

// Usage:
"Fetch https://example.com"

// Returns:
{
  "html": "<html>...",
  "status": 200,
  "headers": {...},
  "timing": { "total": 523 }
}

When to use: Starting point for any scraping task. Always try this first.

2. extract_text (1 credit)

Extracts clean text content, removing HTML tags, scripts, and styles.

Typescript

// Usage:
"Extract text from https://example.com/article"

// Returns:
{
  "text": "Article headline\n\nFirst paragraph...",
  "wordCount": 1247,
  "readingTime": 5
}

When to use: Blog posts, articles, documentation where you need readable text.

3. extract_links (1 credit)

Discovers all links on a page with optional filtering.

Typescript

// Usage:
"Extract all links from https://example.com"

// With filtering:
{
  "url": "https://example.com",
  "filter_external": true  // Internal links only
}

// Returns:
{
  "links": [
    { "href": "/about", "text": "About Us" },
    { "href": "/products", "text": "Products" }
  ],
  "total": 45
}

When to use: Site exploration, finding pages to scrape, building sitemaps.

4. extract_metadata (1 credit)

Pulls SEO metadata: title, description, Open Graph, JSON-LD.

Typescript

// Usage:
"Get metadata from https://example.com"

// Returns:
{
  "title": "Example Site - Homepage",
  "description": "Welcome to Example...",
  "openGraph": {
    "title": "Example Site",
    "image": "https://example.com/og.png"
  },
  "jsonLd": [...]
}

When to use: SEO analysis, content previews, structured data extraction.

Structured Extraction (2-3 credits)

5. scrape_structured (2 credits)

Extracts specific data using CSS selectors.

Typescript

// Usage:
{
  "url": "https://example.com/products",
  "selectors": {
    "title": "h1.product-title",
    "price": "span.price",
    "description": ".product-description"
  }
}

// Returns:
{
  "data": {
    "title": "Product Name",
    "price": "$99.99",
    "description": "Product description..."
  }
}

When to use: E-commerce scraping, structured data, known page layouts.

6. extract_content (2 credits)

Intelligent article extraction (like Readability).

Typescript

// Usage:
"Extract the main content from https://blog.example.com/post"

// Returns:
{
  "title": "Blog Post Title",
  "author": "John Smith",
  "publishedDate": "2026-01-15",
  "content": "Clean article text...",
  "images": ["..."],
  "readingTime": 7
}

When to use: News articles, blog posts, editorial content.

7. map_site (2 credits)

Discovers site structure and generates sitemaps.

Typescript

// Usage:
{
  "url": "https://example.com",
  "max_urls": 1000,
  "include_sitemap": true
}

// Returns:
{
  "pages": [
    { "url": "/", "title": "Home", "depth": 0 },
    { "url": "/about", "title": "About", "depth": 1 },
    ...
  ],
  "structure": {
    "/": ["/about", "/products", "/blog"],
    "/products": ["/products/1", "/products/2"]
  }
}

When to use: Site audits, crawl planning, content discovery.

8. analyze_content (3 credits)

NLP analysis: language, sentiment, topics, entities.

Typescript

// Usage:
"Analyze this content: [text]"

// Returns:
{
  "language": "en",
  "sentiment": { "score": 0.7, "label": "positive" },
  "topics": ["technology", "AI", "automation"],
  "entities": [
    { "text": "OpenAI", "type": "organization" },
    { "text": "GPT-4", "type": "product" }
  ],
  "readability": { "grade": 12, "score": 45 }
}

When to use: Content analysis, sentiment tracking, topic extraction.

Advanced Scraping (4-5 credits)

9. process_document (2 credits)

Handles PDFs and documents.

Typescript

// Usage:
{
  "source": "https://example.com/report.pdf",
  "sourceType": "pdf_url"
}

// Returns:
{
  "text": "Extracted PDF text...",
  "pages": 15,
  "metadata": {
    "author": "...",
    "created": "..."
  }
}

When to use: Research papers, reports, documentation PDFs.

10. summarize_content (4 credits)

AI-powered summarization.

Typescript

// Usage:
"Summarize this article: [long text]"

// Returns:
{
  "summary": "Concise summary...",
  "keyPoints": [
    "Point 1",
    "Point 2",
    "Point 3"
  ],
  "wordReduction": "85%"
}

When to use: Long documents, research synthesis, content digests.

11. crawl_deep (4 credits)

Multi-page crawling with configurable depth.

Typescript

// Usage:
{
  "url": "https://example.com",
  "max_depth": 3,
  "max_pages": 100,
  "include_patterns": ["/blog/*"],
  "exclude_patterns": ["/admin/*"]
}

// Returns:
{
  "pages": [...],  // All crawled pages
  "stats": {
    "total": 87,
    "successful": 85,
    "failed": 2
  }
}

When to use: Full site scraping, content aggregation, archiving.

12. batch_scrape (5 credits)

Parallel scraping of multiple URLs.

Typescript

// Usage:
{
  "urls": [
    "https://example1.com",
    "https://example2.com",
    // ... up to 50 URLs
  ],
  "maxConcurrency": 10
}

// Returns:
{
  "results": [
    { "url": "...", "success": true, "data": {...} },
    ...
  ],
  "stats": { "successful": 48, "failed": 2 }
}

When to use: Multiple known URLs, competitor monitoring, price tracking.

13. scrape_with_actions (5 credits)

Browser automation with actions.

Typescript

// Usage:
{
  "url": "https://example.com/app",
  "actions": [
    { "type": "wait", "selector": ".content" },
    { "type": "click", "selector": "#load-more" },
    { "type": "wait", "timeout": 2000 },
    { "type": "scroll", "selector": "body" },
    { "type": "screenshot" }
  ]
}

// Returns:
{
  "finalContent": "...",
  "screenshots": ["base64..."],
  "actionsExecuted": 5
}

When to use: SPAs, infinite scroll, dynamic content, login required.

14. search_web (5 credits)

Google search integration.

Typescript

// Usage:
{
  "query": "web scraping best practices 2026",
  "limit": 10,
  "site": "github.com"  // Optional site filter
}

// Returns:
{
  "results": [
    {
      "title": "...",
      "url": "...",
      "snippet": "..."
    },
    ...
  ]
}

When to use: Discovery, finding sources, research starting point.

Specialized Tools (3-10 credits)

15. stealth_mode (5 credits)

Anti-detection bypass (detailed in Stealth Mode Guide).

Typescript

// Usage:
{
  "operation": "create_context",
  "stealthConfig": {
    "level": "advanced",
    "hideWebDriver": true,
    "randomizeFingerprint": true,
    "simulateHumanBehavior": true
  }
}

When to use: Protected sites, Cloudflare bypass, anti-bot evasion.

16. track_changes (3 credits)

Content monitoring and change detection.

Typescript

// Usage:
{
  "url": "https://example.com/pricing",
  "operation": "create_baseline",
  "monitoringOptions": {
    "interval": 86400000,  // Daily
    "notificationThreshold": "moderate"
  }
}

// Returns (on change):
{
  "changes": [
    {
      "type": "text_change",
      "path": ".pricing-tier-1",
      "before": "$19/mo",
      "after": "$29/mo",
      "significance": "major"
    }
  ]
}

When to use: Price monitoring, competitor tracking, content updates.

17. localization (2 credits)

Geo-targeted scraping.

Typescript

// Usage:
{
  "operation": "configure_country",
  "countryCode": "GB",
  "language": "en-GB"
}

// Then scrape to get UK-specific content/pricing

When to use: Regional pricing, localized content, geo-restricted data.

18. extract_structured (3 credits)

LLM-powered schema-driven extraction with CSS selector fallback.

Typescript

// Usage:
{
  "url": "https://example.com/product/123",
  "schema": {
    "type": "object",
    "properties": {
      "title": { "type": "string" },
      "price": { "type": "number" }
    },
    "required": ["title"]
  },
  "prompt": "Extract the product name and price"
}

When to use: When you want typed output matching a schema without writing selectors.

19. generate_llms_txt (5 credits)

Analyze a site and emit standard-compliant llms.txt and llms-full.txt files.

Typescript

// Usage:
{
  "url": "https://example.com",
  "format": "both",
  "complianceLevel": "standard",
  "outputOptions": {
    "organizationName": "Example Inc.",
    "contactEmail": "ai@example.com"
  }
}

When to use: Publishing AI interaction guidelines for your website.

20. deep_research (10 credits)

Comprehensive multi-source research (detailed in Deep Research Guide).

Typescript

// Usage:
{
  "topic": "quantum computing commercialization",
  "maxUrls": 50,
  "enableSourceVerification": true,
  "enableConflictDetection": true
}

// Returns:
{
  "synthesis": "Comprehensive analysis...",
  "sources": [...],
  "conflicts": [...],
  "citations": [...]
}

When to use: Research projects, due diligence, market analysis.

Part 4: Integration Guide

Claude Code Setup

Bash

# 1. Install CrawlForge MCP server
npm install -g crawlforge-mcp-server

# 2. Run setup wizard
npx crawlforge-setup

# 3. Add to Claude Code
claude
> /mcp add crawlforge npx crawlforge-mcp-server

# 4. Verify
> /mcp list
# Should show: crawlforge (20 tools)

Claude Desktop Setup

Edit your Claude Desktop config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

Json

{
  "mcpServers": {
    "crawlforge": {
      "command": "npx",
      "args": ["crawlforge-mcp-server"],
      "env": {
        "CRAWLFORGE_API_KEY": "cf_live_your_key_here"
      }
    }
  }
}

Custom Application Integration

Typescript

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";

const transport = new StdioClientTransport({
  command: "npx",
  args: ["crawlforge-mcp-server"],
  env: {
    CRAWLFORGE_API_KEY: process.env.CRAWLFORGE_API_KEY
  }
});

const client = new Client({
  name: "my-app",
  version: "1.0.0"
});

await client.connect(transport);

// List available tools
const tools = await client.listTools();
console.log(`Available tools: ${tools.tools.length}`);

// Call a tool
const result = await client.callTool({
  name: "fetch_url",
  arguments: {
    url: "https://example.com"
  }
});

Part 5: Best Practices

Credit Optimization

Goal	Expensive	Efficient
Check if page exists	deep_research (10)	fetch_url (1)
Get article text	scrape_with_actions (5)	extract_content (2)
Find competitor URLs	search_web × 10 (50)	extract_links (1)
Scrape 20 product pages	fetch_url × 20 (20)	batch_scrape (5)

Error Handling

Typescript

// Always handle failures gracefully
try {
  const result = await fetchUrl(url);
  if (result.status >= 400) {
    // Try with stealth mode
    return await stealthMode(url);
  }
  return result;
} catch (error) {
  // Log and retry with exponential backoff
  await sleep(retryDelay * attempt);
  return retry(url, attempt + 1);
}

Rate Limiting

Respect target sites:

Typescript

// Good: Reasonable delays
for (const url of urls) {
  await scrape(url);
  await sleep(1000 + Math.random() * 2000);  // 1-3s delay
}

// Better: Use batch_scrape with built-in rate limiting
await batchScrape(urls, { delayBetweenRequests: 1500 });

Caching

Don't scrape the same URL twice:

Typescript

const cache = new Map<string, ScrapedContent>();

async function smartScrape(url: string) {
  if (cache.has(url)) {
    return cache.get(url);
  }

  const result = await fetchUrl(url);
  cache.set(url, result);
  return result;
}

Part 6: The Future of MCP Scraping

Emerging Trends

AI-Native Extraction - LLMs directly parsing unstructured HTML
Self-Healing Scrapers - AI adapts to site changes automatically
Semantic Search - Natural language queries across scraped data
Cross-Site Analysis - AI connecting information across sources

CrawlForge Roadmap

Coming in 2026:

Real-time monitoring - Instant change notifications
AI schema generation - Automatic extraction templates
Cross-tool workflows - Chain tools intelligently
Enhanced privacy - Zero-knowledge scraping options

Getting Started

Ready to start MCP web scraping? Here's your path:

Free Tier (Perfect for Getting Started)

1,000 one-time trial credits
All 20 tools available
No credit card required

Bash

# Quick start
npm install -g crawlforge-mcp-server
npx crawlforge-setup
# Visit: https://crawlforge.dev/signup

What You Can Do with 1,000 Credits

Use Case	Tools	Credits	Monthly Capacity
Basic scraping	fetch_url	1	1,000 pages
Article extraction	extract_content	2	500 articles
Site mapping	map_site	2	500 sites
Batch jobs	batch_scrape	5	200 batches (10K URLs)
Research projects	deep_research	10	100 topics

Summary

MCP has revolutionized web scraping for AI applications. Key takeaways:

MCP is the standard - All major AI assistants support it
CrawlForge leads with 20 tools - 4x more than alternatives
Start simple - Use fetch_url (1 credit) before advanced tools
Combine tools - Chain operations for powerful workflows
Be ethical - Respect robots.txt and rate limits

Related Resources:

Get Started Free | View Pricing | Read Docs

On this page

Part 1: Understanding MCP

What is the Model Context Protocol?

Why MCP Matters for Web Scraping

How MCP Works

Part 2: The MCP Web Scraping Ecosystem

MCP Scraping Servers

Why CrawlForge Leads

Part 3: CrawlForge's 20 Tools Explained

Basic Scraping (1-2 credits)

1. fetch_url (1 credit)

2. extract_text (1 credit)

3. extract_links (1 credit)

4. extract_metadata (1 credit)

Structured Extraction (2-3 credits)

5. scrape_structured (2 credits)

6. extract_content (2 credits)

7. map_site (2 credits)

8. analyze_content (3 credits)

Advanced Scraping (4-5 credits)

9. process_document (2 credits)

10. summarize_content (4 credits)

11. crawl_deep (4 credits)

12. batch_scrape (5 credits)

13. scrape_with_actions (5 credits)

14. search_web (5 credits)

Specialized Tools (3-10 credits)

15. stealth_mode (5 credits)

16. track_changes (3 credits)

17. localization (2 credits)

18. extract_structured (3 credits)

19. generate_llms_txt (5 credits)

20. deep_research (10 credits)

Part 4: Integration Guide

Claude Code Setup

Claude Desktop Setup

Custom Application Integration

Part 5: Best Practices

Credit Optimization

Error Handling

Rate Limiting

Caching

Part 6: The Future of MCP Scraping

Emerging Trends

CrawlForge Roadmap

Getting Started

Free Tier (Perfect for Getting Started)

What You Can Do with 1,000 Credits

Summary

Tags

About the Author

CrawlForge Team

Frequently Asked Questions

Related Articles

Web Scraping: Python vs MCP in 2026

Best Web Scraping Tools in 2026: The Definitive Guide

CrawlForge vs Firecrawl: Which MCP Web Scraper Is Right for You?

On this page

Part 1: Understanding MCP

What is the Model Context Protocol?

Why MCP Matters for Web Scraping

How MCP Works

Part 2: The MCP Web Scraping Ecosystem

MCP Scraping Servers

Why CrawlForge Leads

Part 3: CrawlForge's 20 Tools Explained

Basic Scraping (1-2 credits)

1. fetch_url (1 credit)

2. extract_text (1 credit)

3. extract_links (1 credit)

4. extract_metadata (1 credit)

Structured Extraction (2-3 credits)

5. scrape_structured (2 credits)

6. extract_content (2 credits)

7. map_site (2 credits)

8. analyze_content (3 credits)

Advanced Scraping (4-5 credits)

9. process_document (2 credits)

10. summarize_content (4 credits)

11. crawl_deep (4 credits)