Computational Chemistry
MarkushGrapher: Multi-modal Markush Structure Recognition

MarkushGrapher: Multi-modal Markush Structure Recognition

This paper introduces a novel multi-modal approach for extracting chemical Markush structures from patents, combining a Vision-Text-Layout encoder with a specialized chemical vision encoder. It addresses the lack of training data with a robust synthetic generation pipeline and introduces M2S, a new real-world benchmark.

Computational Chemistry
MolSight: OCSR with RL and Multi-Granularity Learning

MolSight: OCSR with RL and Multi-Granularity Learning

MolSight introduces a three-stage training paradigm for Optical Chemical Structure Recognition (OCSR), utilizing large-scale pretraining, multi-granularity fine-tuning with auxiliary bond and coordinate prediction tasks, and reinforcement learning (GRPO) to achieve state-of-the-art performance in recognizing complex stereochemical structures like chiral centers and cis-trans isomers.

Computational Chemistry
RFL: Simplifying Chemical Structure Recognition

RFL: Simplifying Chemical Structure Recognition

Proposes Ring-Free Language (RFL) to hierarchically decouple molecular graphs into skeletons, rings, and branches, solving issues with 1D serialization of complex 2D structures. Introduces the Molecular Skeleton Decoder (MSD) to progressively predict these components, achieving state-of-the-art results on handwritten and printed chemical structures.

Computational Chemistry
ABC-Net detects atom and bond keypoints to reconstruct molecular graphs from images

ABC-Net: Divide-and-Conquer SMILES Recognition

ABC-Net reformulates molecular image recognition as a keypoint detection problem. By predicting atom/bond centers and properties via a single Fully Convolutional Network, it achieves >94% accuracy with high data efficiency.

Computational Chemistry
ChemPix: Hand-Drawn Hydrocarbon Recognition

ChemPix: Hand-Drawn Hydrocarbon Recognition

Proposes a CNN-LSTM architecture that treats chemical structure recognition as an image captioning task. Introduces a robust synthetic data generation pipeline with augmentation, degradation, and background addition to train models that generalize to hand-drawn inputs without seeing real data during pre-training.

Computational Chemistry
DECIMER 1.0: Transformers for Chemical Image Recognition

DECIMER 1.0: Transformers for Chemical Image Recognition

DECIMER 1.0 introduces a Transformer-based architecture coupled with EfficientNet-B3 to solve Optical Chemical Structure Recognition. By leveraging the robust SELFIES representation and scaling training to over 35 million molecules, it achieves state-of-the-art accuracy on synthetic benchmarks, offering an open-source solution for mining chemical data from legacy literature.

Computational Chemistry
End-to-End Transformer for Molecular Image Captioning

End-to-End Transformer for Molecular Image Captioning

This paper introduces a convolution-free, end-to-end transformer model for molecular image translation. By replacing CNN encoders with Vision Transformers, it achieves superior performance on noisy datasets compared to ResNet-LSTM baselines.

Computational Chemistry
ICMDT: Automated Chemical Image Recognition with Deep TNT

ICMDT: Automated Chemical Image Recognition

This paper introduces ICMDT, a Transformer-based architecture for molecular translation (image-to-InChI). By enhancing the TNT block to fuse pixel, small patch, and large patch embeddings, the model achieves superior accuracy on the Bristol-Myers Squibb dataset compared to CNN-RNN and standard Transformer baselines.

Computational Chemistry
Image-to-Graph Transformers for Chemical Structure Recognition

Image-to-Graph Transformers

This paper proposes an end-to-end deep learning architecture that translates chemical images directly into molecular graphs using a ResNet-Transformer encoder and a graph-aware decoder. It addresses the limitations of SMILES-based approaches by effectively handling non-atomic symbols (abbreviations) and varying drawing styles found in scientific literature.

Computational Chemistry

Image2SMILES: Transformer OCSR with Synthetic Data Pipeline

A Transformer-based system for optical chemical structure recognition introducing a comprehensive data generation pipeline (FG-SMILES, Markush structures, visual contamination) achieving 79% accuracy on real-world images, outperforming rule-based systems like OSRA.

Computational Chemistry

MICER: Molecular Image Captioning with Transfer Learning

MICER treats optical chemical structure recognition as an image captioning task, leveraging transfer learning with a fine-tuned ResNet encoder and attention-based LSTM decoder to convert molecular images into SMILES strings, significantly outperforming rule-based and previous deep learning methods.

Computational Chemistry

String Representations for Chemical Image Recognition

This methodological study isolates the impact of chemical string representations on image-to-text translation models. It finds that while SMILES offers the highest overall accuracy, SELFIES provides a guarantee of structural validity, offering a trade-off for OCSR tasks.