Computational Chemistry
Precision and recall comparison of 8 OCSR tools on patent images

Benchmarking Eight OCSR Tools on Patent Images (2024)

Comprehensive evaluation of 8 optical chemical structure recognition tools using a newly curated dataset of 2,702 patent images. Proposes ChemIC, a ResNet-50 classifier to route images to specialized tools based on content type, demonstrating that no single tool excels at all tasks.

Computational Chemistry
Overview of the ChemReco pipeline showing synthetic data generation and EfficientNet+Transformer architecture for hand-drawn chemical structure recognition

ChemReco: Hand-Drawn Chemical Structure Recognition

ChemReco automates the recognition of hand-drawn chemical structures using a synthetic data pipeline and an EfficientNet+Transformer architecture, achieving 96.90% accuracy on C-H-O molecules.

Computational Chemistry
ChemVLM architecture showing molecular structure and text inputs flowing through vision encoder and language model into multimodal LLM for chemical reasoning

ChemVLM: A Multimodal Large Language Model for Chemistry

A 2025 AAAI paper introducing ChemVLM, a domain-specific multimodal LLM (26B parameters). It achieves state-of-the-art performance on chemical OCR, reasoning benchmarks, and molecular understanding tasks by combining vision and language models trained on curated chemistry data.

Computational Chemistry
Overview of the DECIMER.ai platform combining segmentation, classification, and image-to-SMILES recognition

DECIMER.ai: Optical Chemical Structure Recognition

DECIMER.ai addresses the lack of open tools for Optical Chemical Structure Recognition (OCSR) by providing a comprehensive, deep-learning-based workflow. It features a novel data generation pipeline (RanDepict), a web application, and models for segmentation and recognition that rival or exceed proprietary solutions.

Computational Chemistry
Architecture diagram of the DGAT model showing dual-path decoder with CGFE and SDGLA modules

Dual-Path Global Awareness Transformer (DGAT) for OCSR

Proposes a new architecture (DGAT) to resolve global context loss in chemical structure recognition. Introduces Cascaded Global Feature Enhancement and Sparse Differential Global-Local Attention, achieving 84.0% BLEU-4 and handling complex chiral structures implicitly.

Computational Chemistry
Diagram showing the DECIMER hand-drawn OCSR pipeline from hand-drawn chemical structure image through EfficientNetV2 encoder and Transformer decoder to predicted SMILES output

Enhanced DECIMER for Hand-Drawn Structure Recognition

This paper presents an enhanced deep learning architecture for Optical Chemical Structure Recognition (OCSR) specifically optimized for hand-drawn inputs. By pairing an EfficientNetV2 encoder with a Transformer decoder and training on over 150 million synthetic images, the model achieves 73.25% exact match accuracy on a real-world hand-drawn benchmark of 5,088 images.

Computational Chemistry
Diagram of the Image2InChI architecture showing a SwinTransformer encoder connected to an attention-based feature fusion decoder for converting molecular images to InChI strings.

Image2InChI: SwinTransformer for Molecular Recognition

Proposes Image2InChI, an OCSR model with improved SwinTransformer encoder and novel feature fusion network with attention mechanisms that achieves 99.8% InChI accuracy on the BMS dataset.

Computational Chemistry
Architecture diagram of the MarkushGrapher dual-encoder system combining VTL and OCSR encoders for Markush structure recognition.

MarkushGrapher: Multi-modal Markush Structure Recognition

This paper introduces a multi-modal approach for extracting chemical Markush structures from patents, combining a Vision-Text-Layout encoder with a specialized chemical vision encoder. It addresses the lack of training data with a synthetic generation pipeline and introduces M2S, a new real-world benchmark.

Computational Chemistry
Diagram of the MMSSC-Net architecture showing the SwinV2 encoder and GPT-2 decoder pipeline for molecular image recognition

MMSSC-Net: Multi-Stage Sequence Cognitive Networks

MMSSC-Net introduces a multi-stage cognitive approach for OCSR, utilizing a SwinV2 encoder and GPT-2 decoder to recognize atomic and bond sequences. It achieves 75-98% accuracy across benchmark datasets by handling varying image resolutions and noise through fine-grained perception of atoms and bonds.

Computational Chemistry
MolGrapher: Graph-based Visual Recognition of Chemical Structures

MolGrapher: Graph-based Chemical Structure Recognition

MolGrapher introduces a three-stage pipeline (keypoint detection, supergraph construction, GNN classification) for recognizing chemical structures from images. It achieves 91.5% accuracy on USPTO by treating molecules as graphs, and introduces the USPTO-30K benchmark.

Computational Chemistry
Overview of the MolMole pipeline showing ViDetect, ViReact, and ViMore processing document pages to extract molecules and reactions.

MolMole: Unified Vision Pipeline for Molecule Mining

MolMole unifies molecule detection, reaction parsing, and structure recognition into a single vision-based pipeline, achieving top performance on a newly introduced 550-page benchmark by processing full documents without external layout parsers.

Computational Chemistry
ABC-Net detects atom and bond keypoints to reconstruct molecular graphs from images

ABC-Net: Keypoint-Based Molecular Image Recognition

ABC-Net reformulates molecular image recognition as a keypoint detection problem. By predicting atom/bond centers and properties via a single Fully Convolutional Network, it achieves >94% accuracy with high data efficiency.