[GitHub Trending] run-llama/liteparse
Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.
Open-source document parser highly actionable for data engineering tasks.
LiteParse is an open-source, locally-run PDF parser from the LlamaIndex team that extracts spatial text with bounding boxes via PDFium, supports pluggable OCR (Tesseract or custom HTTP servers), and generates PNG screenshots for LLM agents. Outputting structured JSON or layout-preserved text, it runs across platforms with bindings for Rust, Node.js, Python, and WASM, offering a lightweight alternative to cloud-based document parsers for simple documents while deferring complex cases to its sibling LlamaParse.
- Integrate LiteParse as the default local PDF parser in your LLM agent stacks, using its spatial bounding boxes and screenshot capabilities for context, and scale to LlamaParse only when documents exceed local parsing limits.
For AI agent orchestration and data engineering pipelines, LiteParse provides a fast, local, open-source tool to feed structured PDF data into LLM workflows without cloud dependencies or API costs, fitting directly into RAG and data extraction chains.
run-llama