Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.

async document-intelligence mcp metadata-extraction ocr pandoc pdf-extraction pdfium python rag table-extraction tesseract text-extraction
1 Open Issue Need Help Last updated: Sep 13, 2025

Open Issues Need Help

View All on GitHub
bug help wanted good first issue

Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.

Python
#async#document-intelligence#mcp#metadata-extraction#ocr#pandoc#pdf-extraction#pdfium#python#rag#table-extraction#tesseract#text-extraction