Web Scraping & Crawling

Frameworks and tools for web scraping, crawling, and data extraction

Web Scraping & Crawling — comparison of firecrawl, crawl4ai, Scrapling, scrapy, crawlee
The API to search, scrape, and interact with the web at scale. 🔥
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
Scrapy, a fast high-level web crawling & scraping framework for Python.
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Popularity
Stars136,00269,05865,28162,36423,863
Global Rank#62#248#276#300#1639
Weekly Activity(Jun 16 – Jun 22)
New Stars+90+17+44+11+8
Pushes3501421
Issues Closed00000
Community
Forks7,8967,0546,43411,6691,443
Contributors1578421726133
Open Issues3971099617175
Project Info
OwnerfirecrawlunclecodeD4Vinciscrapyapify
LicenseAGPL-3.0Apache-2.0BSD-3-ClauseBSD-3-ClauseApache-2.0
LanguageTypeScriptPythonPythonPythonTypeScript
CreatedApr 2024May 2024Oct 2024Feb 2010Aug 2016