On this page
Half the scraping requests we see at CrawlForge are the same ten sites: Amazon, LinkedIn, GitHub, YouTube, Reddit, Hacker News, Stack Overflow, npm, Product Hunt, and Twitter/X. We got tired of watching people write the same CSS selectors over and over -- and watching those selectors break the next time the site updated its layout. So we did the work once, packaged it as scrape_template, and now you pay 1 credit and get structured JSON.
Table of Contents
- What Is scrape_template?
- The 10 Supported Sites
- Quick Start: Scrape an Amazon Product
- LinkedIn Profiles (With Legal Notes)
- GitHub Repos for AI Training Data
- The Other Seven Templates
- scrape_template vs scrape_structured vs extract_with_llm
- Limitations
What Is scrape_template?
scrape_template is a single CrawlForge tool with ten pre-built site schemas. You pick the template, pass a URL, and get back structured JSON matching that site's natural shape. No CSS selectors. No HTML parsing. No schema definition.
The trade-off: you only get the ten sites we maintain. If you need something else, use scrape_structured (CSS-first) or extract_with_llm (LLM-first). For the long tail of "I want product data from Amazon" requests, scrape_template is the shortest path.
It costs 1 credit per scrape -- the same as a basic fetch_url -- because we have already done the schema work upstream.
The 10 Supported Sites
| Template | Returns | Best for | Example URL pattern |
|---|---|---|---|
amazon | Title, price, rating, review count, images, ASIN, availability | Price monitoring, product research | /dp/<ASIN> |
linkedin | Name, headline, current role, experience, skills, education | Lead enrichment | /in/<handle> |
github | Stars, forks, languages, README, license, topics, last commit | Repo analysis, AI training data | /<owner>/<repo> |
youtube | Title, channel, views, duration, transcript, description | Content research | /watch?v=<id> |
reddit | Post title, score, top comments, subreddit, awards | Community signals | /r/<sub>/comments/<id> |
hackernews | Title, points, URL, author, comments tree | Tech trend tracking | /item?id=<id> |
stackoverflow | Question, accepted answer, vote counts, tags | Developer Q&A mining | /questions/<id> |
npm | Package metadata, weekly downloads, versions, maintainers | Dependency analysis | /package/<name> |
producthunt | Product, tagline, upvotes, makers, hunter | Launch monitoring | /posts/<slug> |
tweet | Text, author, engagement, replies, quote tweets | Social listening | /<user>/status/<id> |
Quick Start: Scrape an Amazon Product
Output:
From an MCP client like Claude Code:
"Use scrape_template with the amazon template to get the current price and rating for ASIN B0CHX1W1XY."
Claude picks the tool, formats the call, and returns the data. One credit.
LinkedIn Profiles (With Legal Notes)
Output:
A note on LinkedIn scraping. LinkedIn's terms of service restrict automated access. The hiQ Labs v. LinkedIn case (9th Circuit, 2022) established that scraping public profile data is generally permissible, but commercial use, login-required scraping, and aggressive frequency can still trigger legal action and ToS bans. Use
scrape_template linkedinfor public, low-frequency, non-resold data only.
GitHub Repos for AI Training Data
Output:
This template is heavily used for AI training-data pipelines -- pulling READMEs at scale across thousands of repos. Pair it with batch_scrape to process a CSV of repo URLs.
The Other Seven Templates
YouTube -- title, channel, views, full transcript when available:
Reddit -- post + comment tree:
Hacker News -- structured story with comments tree:
Stack Overflow -- question, accepted answer, top alternatives:
npm -- package metadata + weekly downloads:
Product Hunt -- product, makers, upvotes:
Twitter/X -- single tweet with engagement and replies:
All return JSON. All cost 1 credit. All maintained centrally -- when LinkedIn or Amazon updates their layout, we update the template.
scrape_template vs scrape_structured vs extract_with_llm
A decision tree:
Is your target one of the 10 supported sites?
Yes -> use scrape_template (1 credit, maintained for you)
No
Do you know the CSS selectors and are they stable?
Yes -> use scrape_structured (2 credits, you maintain selectors)
No -> use extract_with_llm (3 credits, schema-based, layout-resilient)
Quick comparison:
| scrape_template | scrape_structured | extract_with_llm | |
|---|---|---|---|
| Credits | 1 | 2 | 3 |
| Coverage | 10 specific sites | Any site you can write selectors for | Any site |
| Maintenance | We maintain | You maintain | LLM adapts |
| Speed | Fast (cached schemas) | Fast | Slower (LLM call) |
| Best for | Popular sites, high volume | Specific known structure | Unknown or shifting structure |
Limitations
- Only 10 sites. If you need Etsy, eBay, TikTok, or others, you are waiting on the roadmap or rolling your own with
scrape_structured/extract_with_llm. Request templates on Discord. - Public data only. No template requires login. Profiles set to private, gated repos, and protected tweets will return what is publicly visible only.
- Layout changes happen. When a site ships a redesign, we usually have the template patched within 24 hours.
- Rate limits apply. Heavy-volume LinkedIn or Amazon scraping should pair
scrape_templatewithstealth_mode(5 credits) and respect each site's robots.txt.
Ready to skip the selectors? Start free with 1,000 credits -- enough for 1,000 template scrapes. New here? Read the v4.2.2 launch post for context, or the e-commerce extraction guide for a real-world workflow built around these templates.