Computational Chemistry
Pipeline diagram showing keypoint detection, supergraph construction, and GNN classification for molecular structure recognition

MolGrapher: Graph-based Chemical Structure Recognition

MolGrapher introduces a three-stage pipeline (keypoint detection, supergraph construction, GNN classification) for recognizing chemical structures from images. It achieves 91.5% accuracy on USPTO by treating molecules as graphs, and introduces the USPTO-30K benchmark.

Computational Chemistry
Overview of the MolMole pipeline showing ViDetect, ViReact, and ViMore processing document pages to extract molecules and reactions.

MolMole: Unified Vision Pipeline for Molecule Mining

MolMole unifies molecule detection, reaction parsing, and structure recognition into a single vision-based pipeline, achieving top performance on a newly introduced 550-page benchmark by processing full documents without external layout parsers.

Computational Chemistry
Overview of the MolScribe encoder-decoder architecture predicting atoms with coordinates and bonds from a molecular image.

MolScribe: Robust Image-to-Graph Molecular Recognition

MolScribe reformulates molecular recognition as an image-to-graph generation task, explicitly predicting atom coordinates and bonds to better handle stereochemistry and abbreviated structures compared to image-to-SMILES baselines.

Computational Chemistry
ABC-Net detects atom and bond keypoints to reconstruct molecular graphs from images

ABC-Net: Keypoint-Based Molecular Image Recognition

ABC-Net reformulates molecular image recognition as a keypoint detection problem. By predicting atom/bond centers and properties via a single Fully Convolutional Network, it achieves >94% accuracy with high data efficiency.

Computational Chemistry
Handwritten chemical structure recognition with RCGD and SSML

Handwritten Chemical Structure Recognition with RCGD

Proposes a Random Conditional Guided Decoder (RCGD) and a Structure-Specific Markup Language (SSML) to handle the ambiguity and complexity of handwritten chemical structure recognition, validated on a new benchmark dataset (EDU-CHEMC) with 50,000 handwritten images.

Computational Chemistry

MolMiner: Deep Learning OCSR with YOLOv5 Detection

MolMiner replaces traditional rule-based vectorization with a deep learning object detection pipeline (YOLOv5) to extract chemical structures from PDFs. It outperforms open-source baselines on four benchmarks and introduces a new real-world dataset of 3,040 images.

Computational Chemistry
4-chlorofluorobenzene molecular structure diagram for SwinOCSR

SwinOCSR: End-to-End Chemical OCR with Swin Transformers

Proposes an end-to-end architecture replacing standard CNN backbones with Swin Transformer to capture global image context. Introduces Multi-label Focal Loss to handle severe token imbalance in chemical datasets.

Computational Chemistry

A Review of Optical Chemical Structure Recognition Tools

This paper reviews three decades of OCSR development, transitioning from rule-based heuristics to early deep learning approaches. It includes a benchmark study comparing the performance of three open-source tools (OSRA, Imago, MolVec) on four diverse datasets.

Computational Chemistry
Diagram of the chemoCR pipeline converting a bitmap chemical structure into a connection table

Chemical Structure Reconstruction with chemoCR (2011)

Describes chemoCR, a system that converts bitmap chemical diagrams into connection tables using a pipeline of texture-based vectorization, OCR, and a rule-based expert system, achieving 65.6% perfect recall on the TREC 2011 task.

Computational Chemistry
Pipeline diagram of ChemReader chemical structure recognition from image to connection table

ChemReader Image-to-Structure OCR at TREC 2011 Chemical IR

ChemReader achieved 93% accuracy on the TREC 2011 Image-to-Structure task, with detailed error analysis revealing the need for improved chemical intelligence in bond recognition and node merging algorithms.

Computational Chemistry
Overview of CLEF-IP 2012 tasks including patent passage retrieval, flowchart recognition, and chemical structure extraction

CLEF-IP 2012: Patent and Chemical Structure Benchmark

A resource paper detailing the CLEF-IP 2012 benchmarking lab. It introduces specific IR tasks for patent processing along with ground-truth datasets.

Computational Chemistry

MolRec at CLEF 2012: Rule-Based Structure Recognition

Describes the MolRec system’s performance in the CLEF 2012 Chemical Structure Recognition task, detailing its rule-based vectorization engine and analyzing failure modes like touching characters and complex bond types.