Image-to-graph methods bypass string representations entirely, predicting the molecular graph (atoms as nodes, bonds as edges) directly from the input image. This family includes segmentation-based approaches like ChemGrapher and Staker et al.’s U-Net pipeline, keypoint-detection architectures like ABC-Net, and joint atom-bond-coordinate predictors like MolScribe. By reasoning about spatial structure rather than linearizing it, these models tend to handle stereochemistry and abbreviated groups more naturally than sequence-based alternatives. Full-pipeline systems like MolMiner and MolMole extend the approach to page-level chemical extraction from documents.

YearPaperKey Idea
2020ChemGrapher: Deep Learning for Chemical Graph OCSRSemantic segmentation and classification CNNs for chemical graphs
2022ABC-Net: Keypoint-Based Molecular Image RecognitionKeypoint estimation to detect atom and bond centers
2022Image-to-Graph Transformers for Chemical StructuresDirect image-to-graph conversion with abbreviated symbol support
2022MolMiner: Deep Learning OCSR with YOLOv5 DetectionYOLOv5 and MobileNetV2 for document-level molecular extraction
2023MolGrapher: Graph-based Chemical Structure RecognitionGraph-based deep learning outperforming image captioning methods
2023MolScribe: Robust Image-to-Graph Molecular RecognitionJoint prediction of atoms, bonds, and coordinates for OCSR
2025MolMole: Unified Vision Pipeline for Molecule MiningUnified framework for detection, reaction parsing, and OCSR
2026AdaptMol: Domain Adaptation for Molecular OCSR (2026)MMD-based domain adaptation and self-training for hand-drawn OCSR
2026GraSP: Graph Recognition via Subgraph Prediction (2026)Sequential subgraph prediction framework for image-to-graph OCSR