
GutenOCR: A Grounded Vision-Language Front-End for Documents
GutenOCR is a family of vision-language models designed to serve as a ‘grounded OCR front-end’, providing high-quality text transcription and explicit geometric grounding.

GutenOCR is a family of vision-language models designed to serve as a ‘grounded OCR front-end’, providing high-quality text transcription and explicit geometric grounding.

A high-performance, GPU-accelerated PyTorch testbed for ML-MD algorithms featuring JIT-compiled analytical Jacobian force kernels achieving 3-10x speedup over autograd, robust Langevin dynamics with Velocity-Verlet integration, and modular architecture designed as ground-truth validation for novel machine learning approaches in molecular dynamics.

We create TabMe++, an enhanced page stream segmentation benchmark with commercial-grade OCR, and show that parameter-efficiently fine-tuned decoder-based LLMs like Mistral-7B achieve 80% straight-through processing rates, dramatically outperforming encoder-based models.
MOSES introduces a comprehensive benchmarking platform for molecular generative models, offering standardized datasets, evaluation metrics, and baselines.

ChemBERTa-3 provides a unified, scalable infrastructure for pretraining and benchmarking chemical foundation models, addressing reproducibility gaps in previous studies like MoLFormer through standardized scaffold splitting and open-source tooling.

ChemDFM-R is a 14B-parameter chemical reasoning model that integrates a 101B-token dataset of atomized chemical knowledge. Using a novel mix-sourced distillation strategy and domain-specific reinforcement learning, it achieves state-of-the-art performance on chemical benchmarks.

This work investigates the scaling hypothesis for molecular transformers, training RoBERTa models on 77M SMILES from PubChem. It compares Masked Language Modeling (MLM) against Multi-Task Regression (MTR) pretraining, finding that MTR yields better downstream performance but is computationally heavier.

This paper introduces ChemBERTa, a RoBERTa-based model pretrained on 77M SMILES strings. It systematically evaluates the impact of pretraining dataset size, tokenization strategies, and input representations (SMILES vs. SELFIES) on downstream MoleculeNet tasks, finding that performance scales positively with data size.

MERMaid leverages fine-tuned vision models and VLM reasoning to mine chemical reaction data from PDF figures and tables, handling context inference and coreference resolution to build high-fidelity knowledge graphs with 87% end-to-end accuracy.

ChemReco automates the recognition of hand-drawn chemical structures using a synthetic data pipeline and an EfficientNet+Transformer architecture, achieving 96.90% accuracy on C-H-O molecules.

Comprehensive evaluation of 8 optical chemical structure recognition tools using a newly curated dataset of 2,702 patent images. Proposes ChemIC, a ResNet-50 classifier to route images to specialized tools based on content type, demonstrating that no single tool excels at all tasks.

DECIMER.ai addresses the lack of open tools for Optical Chemical Structure Recognition (OCSR) by providing a comprehensive, deep-learning-based workflow. It features a novel data generation pipeline (RanDepict), a web application, and models for segmentation and recognition that rival or exceed proprietary solutions.