Overview
Optical Chemical Structure Recognition (OCSR) aims to automatically extract machine-readable molecular representations (e.g., SMILES, InChI, mol files) from images of chemical structures. Methods have evolved from early rule-based systems to modern deep learning approaches.
This note organizes OCSR methods by their fundamental approach, providing a framework for understanding the landscape of techniques.
Review & Survey Papers
Comprehensive surveys and systematization of knowledge papers that organize and synthesize the OCSR literature.
| Year | Paper | Notes | Focus |
|---|---|---|---|
| 2020 | A review of optical chemical structure recognition tools | Rajan et al. 2020 | Survey of 30 years of OCSR development (1990-2019); benchmark of three open-source tools (OSRA, Imago, MolVec) on four datasets |
| 2022 | Review of techniques and models used in optical chemical structure recognition | Musazade et al. 2022 | Systematization of OCSR evolution from rule-based systems to modern deep learning; identifies paradigm shift to image captioning and critiques evaluation metrics |
| 2024 | Comparing software tools for optical chemical structure recognition | Krasnov et al. 2024 | Benchmark of 8 open-access tools on 2,702 manually curated patent images; proposes ChemIC classifier for hybrid routing approach |
Deep Learning Methods
End-to-end neural network architectures that learn to map images directly to molecular representations.
Note on Paper Types: Papers listed below are primarily Method ($\Psi_{\text{Method}}$) papers focused on novel architectures and performance improvements. Some also have secondary Resource ($\Psi_{\text{Resource}}$) contributions through released tools or datasets. See the AI and Physical Sciences paper taxonomy for classification details.
Image-to-Sequence Paradigm
Treating chemical structure recognition as an image captioning task, these methods use encoder-decoder architectures (often with attention mechanisms) to generate sequential molecular representations like SMILES directly from pixels.
→ See Image-to-Sequence OCSR: A Comparative Analysis for detailed comparison of architectures, output formats, training data, and hardware requirements across these methods.
Image-to-Graph Paradigm
Methods that explicitly construct molecular graphs as intermediate representations, identifying atoms as vertices and bonds as edges before converting to standard molecular formats.
→ See Image-to-Graph OCSR: A Comparative Analysis for detailed comparison of graph construction paradigms, architectures, coordinate prediction strategies, and benchmark performance across these methods.
| Year | Paper | Notes | Architecture |
|---|---|---|---|
| 2020 | ChemGrapher: Optical Graph Recognition of Chemical Compounds by Deep Learning | ChemGrapher Notes | U-Net-based semantic segmentation + graph building algorithm + classification CNNs |
| 2022 | ABC-Net: A divide-and-conquer based deep learning architecture for SMILES recognition | ABC-Net Notes | U-Net-style FCN with keypoint detection heatmaps + multi-task property prediction |
| 2022 | Image-to-Graph Transformers for Chemical Structure Recognition | Image-to-Graph Transformers Notes | ResNet-34 encoder + Transformer encoder + Graph-Aware Transformer (GRAT) decoder |
| 2022 | MolMiner: You Only Look Once for Chemical Structure Recognition | MolMiner Notes | MobileNetV2 segmentation + YOLOv5 object detection + EasyOCR + graph construction |
| 2023 | MolGrapher: Graph-based Visual Recognition of Chemical Structures | MolGrapher Notes | ResNet-18 keypoint detector + supergraph construction + GNN classifier |
| 2023 | MolScribe: Robust Molecular Structure Recognition with Image-To-Graph Generation | MolScribe Notes | Swin Transformer encoder + Transformer decoder with explicit atom coordinates and bond prediction |
| 2024 | Atom-Level Optical Chemical Structure Recognition with Limited Supervision | AtomLenz Notes | Faster R-CNN object detection + graph constructor with weakly supervised training (ProbKT*) |
| 2024 | MolNexTR: a generalized deep learning model for molecular image recognition | MolNexTR Notes | Dual-stream (ConvNext + ViT) encoder + Transformer decoder with graph generation |
| 2025 | MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures | MarkushGrapher Notes | UDOP VTL encoder + MolScribe OCSR encoder + T5 decoder with CXSMILES + substituent table |
| 2025 | MolMole: Molecule Mining from Scientific Literature | MolMole Notes | ViDetect (DINO) + ViReact (RxnScribe) + ViMore (detection-based) unified page-level pipeline |
| 2025 | OCSU: Optical Chemical Structure Understanding for Molecule-centric Scientific Discovery | OCSU Notes | DoubleCheck: MolScribe + attentive feature enhancement with local ambiguous atom refinement |
Image-to-Fingerprint Paradigm
Methods that bypass molecular graph reconstruction entirely, generating molecular fingerprints directly from images through functional group recognition and spatial analysis. These approaches prioritize retrieval and similarity search over exact structure reconstruction.
| Year | Paper | Notes | Architecture |
|---|---|---|---|
| 2025 | SubGrapher: visual fingerprinting of chemical structures | SubGrapher Notes | Dual Mask-RCNN instance segmentation (1,534 groups + 27 backbones) + substructure-graph + SVMF fingerprint |
Image Classification and Filtering
Methods that classify chemical structure images for preprocessing purposes, such as detecting Markush structures or other problematic inputs that should be filtered before full OCSR processing.
| Year | Paper | Notes | Architecture |
|---|---|---|---|
| 2023 | One Strike, You’re Out: Detecting Markush Structures in Low Signal-to-Noise Ratio Images | Jurriaans et al. Notes | Patch-based pipeline with Inception V3 or ResNet18 for binary classification |
Traditional Machine Learning Methods
Hybrid approaches combining classical machine learning algorithms (neural networks, SVMs, CRFs) with domain-specific heuristics and image processing. These methods (primarily from 1992-2014) used ML for specific subtasks like character recognition or symbol classification while relying on rule-based systems for chemical structure interpretation.
Rule-Based Methods
Classic approaches using heuristics, image processing, and domain-specific rules. While some systems use traditional OCR engines (which may contain ML components), the chemical structure recognition itself is purely algorithmic.
Note: The chemoCR systems use SVM-based OCR but employ rule-based topology-preserving vectorization for core structure reconstruction, placing them primarily in this category.
Core Methods
TREC 2011 Chemistry Track
The TREC 2011 Chemistry Track provided a standardized benchmark for comparing OCSR systems, introducing the novel Image-to-Structure task alongside Prior Art and Technology Survey tasks. Papers from this evaluation are grouped here.
CLEF 2012 Chemistry Track
The CLEF-IP 2012 benchmarking lab introduced three specific IR tasks in the intellectual property domain: claims-based passage retrieval, flowchart recognition, and chemical structure recognition. The chemical structure recognition task included both segmentation (identifying bounding boxes) and recognition (converting to MOL format) subtasks, with a particular focus on challenging Markush structures common in patents.
| System | Paper | Notes |
|---|---|---|
| MolRec | MolRec at CLEF 2012 - Overview and Analysis of Results | MolRec at CLEF 2012 Notes |
| OSRA | Optical Structure Recognition Application entry to CLEF-IP 2012 | OSRA at CLEF-IP 2012 Notes |