Computer-Vision

Diagram showing graph traversal chain-of-thought parsing of a molecular structure image into atom and bond predictions

GTR-CoT: Graph Traversal Chain-of-Thought for Molecules

A 2025 Vision-Language Model for OCSR that uses graph traversal chain-of-thought reasoning and a two-stage SFT plus GRPO training scheme to handle both printed molecules (including chemical abbreviations like Ph and Et) and hand-drawn structures, achieving strong performance on the new MolRec-Bench benchmark.

Computational Chemistry

OCSU: Optical Chemical Structure Understanding (2025)

Proposes the ‘Optical Chemical Structure Understanding’ (OCSU) task to translate molecular images into multi-level descriptions (motifs, IUPAC, SMILES). Introduces the Vis-CheBI20 dataset and two paradigms: DoubleCheck (OCSR-based) and Mol-VL (OCSR-free).

Machine Learning Fundamentals

Comparison of planar CNN (translation only) versus spherical CNN (SO(3)-equivariant) showing how filters rotate on the sphere

Spherical CNNs: Rotation-Equivariant Networks on the Sphere

Cohen et al. introduce Spherical CNNs that achieve SO(3)-equivariance by defining cross-correlation on the sphere and rotation group, computed efficiently via generalized FFT algorithms from non-commutative harmonic analysis.

Document Processing

Conceptual diagram of page stream segmentation sorting pages into documents

The Evolution of Page Stream Segmentation: Rules to LLMs

We trace the history of Page Stream Segmentation (PSS) through three eras (Heuristic, Encoder, and Decoder) and explain how privacy-preserving, localized LLMs enable true semantic processing.

Document Processing

Statistics of the PubMed-OCR dataset including number of articles, pages, words, and bounding boxes.

PubMed-OCR: PMC Open Access OCR Annotations

PubMed-OCR provides 1.5M pages of scientific articles with comprehensive OCR annotations and bounding boxes to support layout-aware modeling and document analysis.

Computational Chemistry

ChemDFM-X architecture showing five modalities (2D graphs, 3D conformations, images, MS2 spectra, IR spectra) feeding through separate encoders into unified LLM decoder

ChemDFM-X: Multimodal Foundation Model for Chemistry

ChemDFM-X is a multimodal chemical foundation model that integrates five non-text modalities (2D graphs, 3D conformations, images, MS2 spectra, IR spectra) into a single LLM decoder. It overcomes data scarcity by generating a 7.6M instruction-tuning dataset through approximate calculations and model predictions, establishing strong baseline performance across multiple modalities.

Computational Chemistry

Comparative analysis of image-to-sequence OCSR methods

Image-to-Sequence OCSR: A Comparative Analysis

Deep dive into 24 image-to-sequence OCSR methods (2019-2025), comparing encoder-decoder architectures, molecular string representations, training scale, and hardware requirements.

Computational Chemistry

Diagram showing text, molecular structures, and reactions feeding into a multimodal index and search system that outputs passages with context

Multimodal Search in Chemical Documents and Reactions

This paper presents a multimodal search system that facilitates passage-level retrieval of chemical reactions and molecular structures by linking diagrams, text, and reaction records extracted from scientific PDFs.

Computational Chemistry

Overview of the OCSAug pipeline showing DDPM training, masked RePaint augmentation, and OCSR fine-tuning phases.

OCSAug: Diffusion-Based Augmentation for Hand-Drawn OCSR

OCSAug uses Denoising Diffusion Probabilistic Models (DDPM) and the RePaint algorithm with custom masking to generate synthetic hand-drawn chemical structure images, improving OCSR performance by 1.918-3.820x on the DECIMER benchmark.

Computational Chemistry

AtomLenz learns atom-level detection from hand-drawn molecular images with weak supervision

AtomLenz: Atom-Level OCSR with Limited Supervision

Introduces AtomLenz, an OCSR tool that combines object detection with a molecular graph constructor. Features a novel weakly supervised training scheme (ProbKT*) to learn atom-level localization from SMILES-only data, achieving state-of-the-art results on hand-drawn images.

Computational Chemistry

Precision and recall comparison of 8 OCSR tools on patent images

Benchmarking Eight OCSR Tools on Patent Images (2024)

Comprehensive evaluation of 8 optical chemical structure recognition tools using a newly curated dataset of 2,702 patent images. Proposes ChemIC, a ResNet-50 classifier to route images to specialized tools based on content type, demonstrating that no single tool excels at all tasks.

Computational Chemistry

Overview of the ChemReco pipeline showing synthetic data generation and EfficientNet+Transformer architecture for hand-drawn chemical structure recognition

ChemReco: Hand-Drawn Chemical Structure Recognition

ChemReco automates the recognition of hand-drawn chemical structures using a synthetic data pipeline and an EfficientNet+Transformer architecture, achieving 96.90% accuracy on C-H-O molecules.