Web Crawling and RSS Reading Made Easy

Tired of building yet another RSS client or web crawler?

Don’t worry – Crawler Buddy is here to save the day! This project makes it easy to crawl web pages and return digestible responses in JSON format.

Key Features:

  • No more reliance on external tools: Forget about yt-dlp or Beautiful Soup for link metadata extraction.
  • Standardized metadata: Get consistent fields like title, description, date_published, and more.
  • Bot protection? No problem: Access RSS feeds—even on sites with tricky bot protection—without custom HTTP wrappers.
  • Automatic feed detection: It can automatically discover RSS feed URLs for websites and YouTube channels in many cases.
  • Simplified data handling: Skip parsing RSS files. Just consume easy-to-use JSON.
  • Unified interface: Access all metadata from a single, simple interface.
  • Containerized Docker environment: Isolate problems from your host OS for seamless operation.
  • Scalability: Whether you’re running a single server or multiple, Crawler Buddy fits your needs.
  • UTF-8 encoding: Say goodbye to encoding issues—everything is in UTF.

Available Crawlers:

  • RequestsCrawler: Python requests
  • CrawleeScript: Crawlee with BeautifulSoup
  • PlaywrightScript: Crawlee with Playwright
  • SeleniumUndetected: Undetected Selenium
  • SeleniumChromeHeadless: Selenium in headless mode
  • SeleniumChromeFull: Full Selenium mode
  • StealthRequestsCrawler: Stealthy requests

Want to learn more?
Check out the official repository: Crawler Buddy GitHub

原文链接:Web Crawling and RSS Reading Made Easy

© 版权声明
THE END
喜欢就支持一下吧
点赞14 分享
A man's best friends are his ten fingers.
人最好的朋友是自己的十个手指
评论 抢沙发

请登录后发表评论

    暂无评论内容