Optical Chemical Structure Recognition
AtomLenz learns atom-level detection from hand-drawn molecular images with weak supervision

AtomLenz: Atom-Level OCSR with Limited Supervision

Introduces AtomLenz, an OCSR tool that combines object detection with a molecular graph constructor. Features a novel weakly supervised training scheme (ProbKT*) to learn atom-level localization from SMILES-only data, achieving state-of-the-art results on hand-drawn images.

Optical Chemical Structure Recognition
Overview of the MolScribe encoder-decoder architecture predicting atoms with coordinates and bonds from a molecular image.

MolScribe: Robust Image-to-Graph Molecular Recognition

MolScribe reformulates molecular recognition as an image-to-graph generation task, explicitly predicting atom coordinates and bonds to better handle stereochemistry and abbreviated structures compared to image-to-SMILES baselines.

Optical Chemical Structure Recognition

Online Handwritten Chemical Formula Structure Analysis

A three-level grammatical framework (formula, molecule, text) for parsing online handwritten chemical formulas, generating semantic graphs that capture both connectivity and layout using context-free grammars and HMMs.

Optical Chemical Structure Recognition
Diagram showing MolNexTR's dual-stream architecture: a molecular image feeds into parallel ConvNext and Vision Transformer encoders, producing a SMILES string.

MolNexTR: A Dual-Stream Molecular Image Recognition

MolNexTR proposes a dual-stream architecture combining ConvNext and Vision Transformers to improve molecular image recognition (OCSR). It achieves 81-97% accuracy across diverse benchmarks utilizing simultaneous local and global feature extraction alongside specialized image contamination augmentations.

Molecular Representations
Log-scale plot showing exponential growth of alkane isomer counts from C1 to C40

The Number of Isomeric Hydrocarbons of the Methane Series

A foundational 1931 paper that derives exact recursive formulas for counting alkane structural isomers, correcting historical errors and establishing the first systematic enumeration up to C₄₀.