Image-to-Graph Models

Image-to-graph methods bypass string representations entirely, predicting the molecular graph (atoms as nodes, bonds as edges) directly from the input image. This family includes segmentation-based approaches like ChemGrapher and Staker et al.’s U-Net pipeline, keypoint-detection architectures like ABC-Net, and joint atom-bond-coordinate predictors like MolScribe. By reasoning about spatial structure rather than linearizing it, these models tend to handle stereochemistry and abbreviated groups more naturally than sequence-based alternatives. Full-pipeline systems like MolMiner and MolMole extend the approach to page-level chemical extraction from documents.

Year	Paper	Key Idea
2020	ChemGrapher: Deep Learning for Chemical Graph OCSR	Semantic segmentation and classification CNNs for chemical graphs
2022	ABC-Net: Keypoint-Based Molecular Image Recognition	Keypoint estimation to detect atom and bond centers
2022	Image-to-Graph Transformers for Chemical Structures	Direct image-to-graph conversion with abbreviated symbol support
2022	MolMiner: Deep Learning OCSR with YOLOv5 Detection	YOLOv5 and MobileNetV2 for document-level molecular extraction
2023	MolGrapher: Graph-based Chemical Structure Recognition	Graph-based deep learning outperforming image captioning methods
2023	MolScribe: Robust Image-to-Graph Molecular Recognition	Joint prediction of atoms, bonds, and coordinates for OCSR
2025	MolMole: Unified Vision Pipeline for Molecule Mining	Unified framework for detection, reaction parsing, and OCSR
2026	AdaptMol: Domain Adaptation for Molecular OCSR (2026)	MMD-based domain adaptation and self-training for hand-drawn OCSR
2026	GraSP: Graph Recognition via Subgraph Prediction (2026)	Sequential subgraph prediction framework for image-to-graph OCSR

Computational Chemistry

AdaptMol domain adaptation pipeline showing encoder-decoder with MMD alignment between labeled source and unlabeled target domain images

AdaptMol: Domain Adaptation for Molecular OCSR (2026)

AdaptMol combines an end-to-end graph reconstruction model with unsupervised domain adaptation via class-conditional MMD on bond features and SMILES-validated self-training. Achieves 82.6% accuracy on hand-drawn molecules (10.7 points above prior best) while maintaining state-of-the-art results on four literature benchmarks, using only 4,080 real hand-drawn images for adaptation.

Computational Chemistry

GraSP feed-forward architecture showing GNN, FiLM-conditioned CNN, and MLP classification head

GraSP: Graph Recognition via Subgraph Prediction (2026)

GraSP introduces a general framework for recognizing graphs in images by framing it as sequential subgraph prediction with a binary classifier. A GNN conditions a CNN via FiLM layers to predict whether a candidate graph is a subgraph of the target. Applied to OCSR on QM9, GraSP achieves 67.5% accuracy with no domain-specific modifications.

Computational Chemistry

Pipeline diagram showing keypoint detection, supergraph construction, and GNN classification for molecular structure recognition

MolGrapher: Graph-based Chemical Structure Recognition

MolGrapher introduces a three-stage pipeline (keypoint detection, supergraph construction, GNN classification) for recognizing chemical structures from images. It achieves 91.5% accuracy on USPTO by treating molecules as graphs, and introduces the USPTO-30K benchmark.

Computational Chemistry

Overview of the MolMole pipeline showing ViDetect, ViReact, and ViMore processing document pages to extract molecules and reactions.

MolMole: Unified Vision Pipeline for Molecule Mining

MolMole unifies molecule detection, reaction parsing, and structure recognition into a single vision-based pipeline, achieving top performance on a newly introduced 550-page benchmark by processing full documents without external layout parsers.

Computational Chemistry

Overview of the MolScribe encoder-decoder architecture predicting atoms with coordinates and bonds from a molecular image.

MolScribe: Robust Image-to-Graph Molecular Recognition

MolScribe reformulates molecular recognition as an image-to-graph generation task, explicitly predicting atom coordinates and bonds to better handle stereochemistry and abbreviated structures compared to image-to-SMILES baselines.

Computational Chemistry

ABC-Net detects atom and bond keypoints to reconstruct molecular graphs from images

ABC-Net: Keypoint-Based Molecular Image Recognition

ABC-Net reformulates molecular image recognition as a keypoint detection problem. By predicting atom/bond centers and properties via a single Fully Convolutional Network, it achieves >94% accuracy with high data efficiency.

Computational Chemistry

Diagram showing a pixelated chemical image passing through a multi-layer encoder to produce a molecular graph with nodes and edges.

Image-to-Graph Transformers for Chemical Structures

This paper proposes an end-to-end deep learning architecture that translates chemical images directly into molecular graphs using a ResNet-Transformer encoder and a graph-aware decoder. It addresses the limitations of SMILES-based approaches by effectively handling non-atomic symbols (abbreviations) and varying drawing styles found in scientific literature.

Computational Chemistry

MolMiner: Deep Learning OCSR with YOLOv5 Detection

MolMiner replaces traditional rule-based vectorization with a deep learning object detection pipeline (YOLOv5) to extract chemical structures from PDFs. It outperforms open-source baselines on four benchmarks and introduces a new real-world dataset of 3,040 images.

Computational Chemistry

ChemGrapher pipeline overview showing segmentation and classification stages

ChemGrapher: Deep Learning for Chemical Graph OCSR

ChemGrapher replaces rule-based chemical OCR with a deep learning pipeline using semantic segmentation to identify atom and bond candidates, followed by specialized classification networks to resolve stereochemistry and bond multiplicity, reducing error rates compared to OSRA across all tested styles.