The most recent wave of OCSR methods builds on large pretrained vision-language models, using their broad visual understanding to generalize across diverse chemical diagram styles and notation conventions. GTR-CoT introduces graph-traversal chain-of-thought reasoning to guide recognition. MolParser uses an end-to-end architecture with Extended SMILES. MolNexTR combines ConvNext and ViT in a dual-stream encoder. SubGrapher takes a retrieval-oriented approach through visual fingerprinting of functional groups. This group also includes MolParser-7M, currently the largest OCSR dataset, and OCSU, which extends the task beyond structure prediction to multi-level molecular description.

YearPaperKey Idea
2024MolNexTR: A Dual-Stream Molecular Image RecognitionDual-stream ConvNext + ViT encoder for robust OCSR
2025GTR-CoT: Graph Traversal Chain-of-Thought for MoleculesGraph traversal chain-of-thought for printed and hand-drawn OCSR
2025MolParser: End-to-End Molecular Structure RecognitionEnd-to-end learning with Extended SMILES representation
2025MolParser-7M & WildMol: Large-Scale OCSR DatasetsLargest open-source OCSR dataset with 7.7M image-SMILES pairs
2025OCSU: Optical Chemical Structure Understanding (2025)Multi-level molecular description beyond structure prediction
2025SubGrapher: Visual Fingerprinting of Chemical StructuresMolecular fingerprints from images via functional group segmentation