[GitHub Trending] NanmiCoder/MediaCrawler

5.4 relevance

Social media crawler, tangentially related to data engineering but not a core interest.

Open Source github.com

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频｜评论爬虫、微博帖子｜评论爬虫、百度贴吧帖子｜百度贴吧评论回复爬虫 | 知乎问答文章｜评论爬虫 - NanmiCoder/MediaCrawler

Summary

MediaCrawler is an open-source tool for scraping public data from Chinese social media platforms including Xiaohongshu, Douyin, Kuaishou, Bilibili, Weibo, Tieba, and Zhihu. It leverages Playwright for browser automation to capture login states and bypass JS reverse engineering, supporting features like keyword search, post-level crawling, comment extraction, and IP proxy pools. A Pro version adds AI agent-based content extraction, resume-on-interrupt, multi-account support, and removes Playwright dependency for simpler deployment.

Author

NanmiCoder