
ChemReco: Hand-Drawn Chemical Structure Recognition
ChemReco automates the recognition of hand-drawn chemical structures using a synthetic data pipeline and an EfficientNet+Transformer architecture, achieving 96.90% accuracy on C-H-O molecules.

ChemReco automates the recognition of hand-drawn chemical structures using a synthetic data pipeline and an EfficientNet+Transformer architecture, achieving 96.90% accuracy on C-H-O molecules.

A 2025 AAAI paper introducing ChemVLM, a domain-specific multimodal LLM (26B parameters). It achieves state-of-the-art performance on chemical OCR, reasoning benchmarks, and molecular understanding tasks by combining vision and language models trained on curated chemistry data.

DECIMER.ai addresses the lack of open tools for Optical Chemical Structure Recognition (OCSR) by providing a comprehensive, deep-learning-based workflow. It features a novel data generation pipeline (RanDepict), a web application, and models for segmentation and recognition that rival or exceed proprietary solutions.

Proposes a new architecture (DGAT) to resolve global context loss in chemical structure recognition. Introduces Cascaded Global Feature Enhancement and Sparse Differential Global-Local Attention, achieving robust results (84.0% BLEU-4) and handling complex chiral structures implicitly.

This paper presents an enhanced deep learning architecture for Optical Chemical Structure Recognition (OCSR) specifically optimized for hand-drawn inputs. By pairing an EfficientNetV2 encoder with a Transformer decoder and training on over 150 million synthetic images, the model achieves state-of-the-art accuracy on real-world hand-drawn benchmarks.

Proposes Image2InChI, an OCSR model with improved SwinTransformer encoder and novel feature fusion network with attention mechanisms that achieves 99.8% InChI accuracy on the BMS dataset.

This paper introduces a novel multi-modal approach for extracting chemical Markush structures from patents, combining a Vision-Text-Layout encoder with a specialized chemical vision encoder. It addresses the lack of training data with a robust synthetic generation pipeline and introduces M2S, a new real-world benchmark.

MMSSC-Net introduces a multi-stage cognitive approach for OCSR, utilizing a SwinV2 encoder and GPT-2 decoder to recognize atomic and bond sequences. It achieves high accuracy (94%+) on benchmark datasets by effectively handling varying image resolutions and noise.

MolGrapher introduces a novel three-stage pipeline (keypoint detection, supergraph construction, GNN classification) for recognizing chemical structures from images. It achieves state-of-the-art results by treating molecules as graphs, and introduces the USPTO-30K benchmark.

MolMole unifies molecule detection, reaction parsing, and structure recognition into a single vision-based pipeline, achieving SOTA performance on a newly introduced 550-page benchmark by processing full documents without external layout parsers.

MolScribe reformulates molecular recognition as an image-to-graph generation task, explicitly predicting atom coordinates and bonds to better handle stereochemistry and abbreviated structures compared to image-to-SMILES baselines.

MolSight introduces a three-stage training paradigm for Optical Chemical Structure Recognition (OCSR), utilizing large-scale pretraining, multi-granularity fine-tuning with auxiliary bond and coordinate prediction tasks, and reinforcement learning (GRPO) to achieve state-of-the-art performance in recognizing complex stereochemical structures like chiral centers and cis-trans isomers.