Notes on recognizing molecular structures from images, covering 35 years of methods: from rule-based vectorization to vision-language models.
A substantial fraction of chemical knowledge is recorded as 2D diagrams in journals, patents, and textbooks. Optical Chemical Structure Recognition (OCSR) is the task of extracting machine-readable molecular representations from those images: strings like SMILES (a compact text encoding of molecular structure) and InChI (a standardized identifier for chemical substances), or molecular graphs that encode atoms as nodes and bonds as edges. For a longer introduction to the field and its motivations, see the What is OCSR? post.
These notes trace the field from its origins in the early 1990s through to current vision-language approaches. Three broad eras give the collection its shape. The rule-based pioneers (1990s to mid-2010s), including tools like OSRA, MolVec, CLiDE, and Imago, vectorized images and applied hand-coded rules to classify bonds and atoms; their brittleness came from the difficulty of encoding every edge case explicitly. The deep learning transition (roughly 2015 to 2020) replaced those hand-coded rules with models that learned recognition patterns from large synthetic datasets, yielding both image-to-sequence architectures (DECIMER, Img2Mol, Image2SMILES) and image-to-graph architectures (MolGrapher, MolScribe). The current vision-language era (2021 onward), with models like MolParser, GTR-Mol-VLM, and Subgrapher, builds on large pretrained vision-language models to improve generalization across diverse diagram styles and chemical notation conventions.
Beyond the core recognition systems, the collection includes review papers, benchmark and competition write-ups (TREC-Chem 2011, CLEF-IP 2012), and notes on specialized sub-tasks: hand-drawn structure recognition, Markush structure detection, and component-level problems like ring and bond parsing.
For orientation, the two survey papers are the best starting points: rajan-ocsr-review-2020 covers the rule-based era and benchmarks the transition period, while musazade-ocsr-review-2022 picks up the thread with deep learning methods.
Deep dive into 24 image-to-sequence OCSR methods (2019-2025), comparing encoder-decoder architectures, molecular string representations, training scale, and hardware requirements.
OCSAug: Diffusion-Based Augmentation for Hand-Drawn OCSR
OCSAug leverages Denoising Diffusion Probabilistic Models (DDPM) and the RePaint algorithm with custom masking to generate synthetic hand-drawn chemical structure images, significantly improving OCSR performance on benchmarks like DECIMER.
AtomLenz: Atom-Level OCSR with Limited Supervision
Introduces AtomLenz, an OCSR tool that combines object detection with a molecular graph constructor. Features a novel weakly supervised training scheme (ProbKT*) to learn atom-level localization from SMILES-only data, achieving state-of-the-art results on hand-drawn images.
ChemReco: Hand-Drawn Chemical Structure Recognition
ChemReco automates the recognition of hand-drawn chemical structures using a synthetic data pipeline and an EfficientNet+Transformer architecture, achieving 96.90% accuracy on C-H-O molecules.
Comprehensive evaluation of 8 optical chemical structure recognition tools using a newly curated dataset of 2,702 patent images. Proposes ChemIC, a ResNet-50 classifier to route images to specialized tools based on content type, demonstrating that no single tool excels at all tasks.
DECIMER.ai: Optical Chemical Structure Recognition
DECIMER.ai addresses the lack of open tools for Optical Chemical Structure Recognition (OCSR) by providing a comprehensive, deep-learning-based workflow. It features a novel data generation pipeline (RanDepict), a web application, and models for segmentation and recognition that rival or exceed proprietary solutions.
Proposes a new architecture (DGAT) to resolve global context loss in chemical structure recognition. Introduces Cascaded Global Feature Enhancement and Sparse Differential Global-Local Attention, achieving robust results (84.0% BLEU-4) and handling complex chiral structures implicitly.
Enhanced DECIMER for Hand-Drawn Structure Recognition
This paper presents an enhanced deep learning architecture for Optical Chemical Structure Recognition (OCSR) specifically optimized for hand-drawn inputs. By pairing an EfficientNetV2 encoder with a Transformer decoder and training on over 150 million synthetic images, the model achieves state-of-the-art accuracy on real-world hand-drawn benchmarks.
Image2InChI: SwinTransformer for Molecular Recognition
Proposes Image2InChI, an OCSR model with improved SwinTransformer encoder and novel feature fusion network with attention mechanisms that achieves 99.8% InChI accuracy on the BMS dataset.
This paper introduces a novel multi-modal approach for extracting chemical Markush structures from patents, combining a Vision-Text-Layout encoder with a specialized chemical vision encoder. It addresses the lack of training data with a robust synthetic generation pipeline and introduces M2S, a new real-world benchmark.
MMSSC-Net introduces a multi-stage cognitive approach for OCSR, utilizing a SwinV2 encoder and GPT-2 decoder to recognize atomic and bond sequences. It achieves high accuracy (94%+) on benchmark datasets by effectively handling varying image resolutions and noise.
MolGrapher introduces a novel three-stage pipeline (keypoint detection, supergraph construction, GNN classification) for recognizing chemical structures from images. It achieves state-of-the-art results by treating molecules as graphs, and introduces the USPTO-30K benchmark.