Document Processing
GutenOCR Mascot

GutenOCR: A Grounded Vision-Language Front-End for Documents

GutenOCR introduces vision-language models for grounded OCR, offering precise text transcription and geometric grounding …

Document Processing
Statistics of the PubMed-OCR dataset including number of articles, pages, words, and bounding boxes.

PubMed-OCR: PMC Open Access OCR Annotations

A large-scale dataset of 209K+ articles with OCR and layout bounding boxes, enabling layout-aware modeling and document …

Document Processing
Diagram showing page stream segmentation workflow: an input stream of pages is processed through binary classification of page pairs to predict document breaks, producing segmented output documents

LLMs for Page Stream Segmentation

Enhanced TabMe benchmark for page stream segmentation, creating TabMe++, showing fine-tuned decoder-based LLMs …

Document Processing
Stream accuracy versus relative throughput for Mistral-7B and XGBoost models

LLMs for Insurance Document Automation

LLM applications for insurance document automation using parameter-efficient fine-tuning and analysis of calibration …