Document Processing
A colored molecule with annotations, representing the diverse drawing styles found in scientific papers that OCSR models must handle.

MolParser-7M and WildMol Datasets for Robust Chemical Structure Recognition

MolParser-7M is a 7.7M-pair dataset for molecule-to-text conversion, featuring real-world images and complex structures …

Document Processing

LLMs for Insurance Document Automation

LLM applications for insurance document automation using parameter-efficient fine-tuning and analysis of calibration …...

Document Processing

LLMs for Page Stream Segmentation

Enhanced TABME benchmark for page stream segmentation, creating TABME++, showing fine-tuned decoder-based LLMs …...