The most recent wave of OCSR methods builds on large pretrained vision-language models, using their broad visual understanding to generalize across diverse chemical diagram styles and notation conventions. GTR-CoT introduces graph-traversal chain-of-thought reasoning to guide recognition. MolParser uses an end-to-end architecture with Extended SMILES. MolNexTR combines ConvNext and ViT in a dual-stream encoder. SubGrapher takes a retrieval-oriented approach through visual fingerprinting of functional groups. This group also includes MolParser-7M, currently the largest OCSR dataset, and OCSU, which extends the task beyond structure prediction to multi-level molecular description.
| Year | Paper | Key Idea |
|---|---|---|
| 2024 | MolNexTR: A Dual-Stream Molecular Image Recognition | Dual-stream ConvNext + ViT encoder for robust OCSR |
| 2025 | GTR-CoT: Graph Traversal Chain-of-Thought for Molecules | Graph traversal chain-of-thought for printed and hand-drawn OCSR |
| 2025 | MolParser: End-to-End Molecular Structure Recognition | End-to-end learning with Extended SMILES representation |
| 2025 | MolParser-7M & WildMol: Large-Scale OCSR Datasets | Largest open-source OCSR dataset with 7.7M image-SMILES pairs |
| 2025 | OCSU: Optical Chemical Structure Understanding (2025) | Multi-level molecular description beyond structure prediction |
| 2025 | SubGrapher: Visual Fingerprinting of Chemical Structures | Molecular fingerprints from images via functional group segmentation |





