[GitHub Trending] opendatalab/MinerU
7.3 relevance
Score Breakdown
technical depth 7
novelty 7
actionability 8
community 7
strategic 6
personal 9
Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.
Transforms PDFs to LLM-ready formats, directly enabling agentic data pipelines.
Summary
MinerU is an open-source document parser for LLM/RAG/Agent workflows, converting PDF and Office formats to structured Markdown/JSON via a dual VLM+OCR engine covering 109 languages. It integrates natively with LangChain, Dify, and MCP Server, and supports three inference backends (pipeline, vlm-engine, hybrid-engine) for CPU/GPU and domestic AI chips. The 3.4 release upgraded the pipeline OCR to PP-OCRv6, delivering 11% higher accuracy and 2x faster processing.
Author
opendatalab