Molecular Representations
Vintage wooden device labeled 'The Molecular Interpreter - Model 1974' with vacuum tubes, showing SMILES to IUPAC name translation

STOUT: SMILES to IUPAC Names via Neural Machine Translation

STOUT (SMILES-TO-IUPAC-name translator) uses neural machine translation to convert chemical line notations to IUPAC names and vice versa, achieving ~90% BLEU score. It addresses the lack of open-source tools for algorithmic IUPAC naming.

Molecular Representations
Diagram showing Struct2IUPAC workflow: molecular structure (SMILES) passing through Transformer to generate IUPAC name, with round-trip verification loop

Struct2IUPAC: Translating SMILES to IUPAC via Transformers

This paper proposes a Transformer-based approach (Struct2IUPAC) to convert chemical structures to IUPAC names, challenging the dominance of rule-based systems. Trained on ~47M PubChem examples, it achieves near-perfect accuracy using a round-trip verification step with OPSIN.

Molecular Representations
Transformer encoder-decoder architecture processing InChI string character-by-character to produce IUPAC chemical name

Translating InChI to IUPAC Names with Transformers

This study presents a sequence-to-sequence Transformer model that translates InChI identifiers into IUPAC names character-by-character. Trained on 10 million PubChem pairs, it achieves 91% accuracy on organic compounds, performing comparably to commercial software.

Computational Chemistry
ChemVLM architecture showing molecular structure and text inputs flowing through vision encoder and language model into multimodal LLM for chemical reasoning

ChemVLM: A Multimodal Large Language Model for Chemistry

A 2025 AAAI paper introducing ChemVLM, a domain-specific multimodal LLM (26B parameters). It achieves state-of-the-art performance on chemical OCR, reasoning benchmarks, and molecular understanding tasks by combining vision and language models trained on curated chemistry data.

Optical Chemical Structure Recognition
Diagram of the Image2InChI architecture showing a SwinTransformer encoder connected to an attention-based feature fusion decoder for converting molecular images to InChI strings.

Image2InChI: SwinTransformer for Molecular Recognition

Proposes Image2InChI, an OCSR model with improved SwinTransformer encoder and novel feature fusion network with attention mechanisms that achieves 99.8% InChI accuracy on the BMS dataset.

Optical Chemical Structure Recognition
Architecture diagram of the MarkushGrapher dual-encoder system combining VTL and OCSR encoders for Markush structure recognition.

MarkushGrapher: Multi-modal Markush Structure Recognition

This paper introduces a multi-modal approach for extracting chemical Markush structures from patents, combining a Vision-Text-Layout encoder with a specialized chemical vision encoder. It addresses the lack of training data with a synthetic generation pipeline and introduces M2S, a new real-world benchmark.

Optical Chemical Structure Recognition
Three-stage training pipeline for MolSight showing pretraining, multi-granularity fine-tuning, and RL post-training stages

MolSight: OCSR with RL and Multi-Granularity Learning

MolSight introduces a three-stage training paradigm for Optical Chemical Structure Recognition (OCSR), utilizing large-scale pretraining, multi-granularity fine-tuning with auxiliary bond and coordinate prediction tasks, and reinforcement learning (GRPO) to achieve 85.1% stereochemical accuracy on USPTO, recognizing complex stereochemical structures like chiral centers and cis-trans isomers.

Optical Chemical Structure Recognition
ABC-Net detects atom and bond keypoints to reconstruct molecular graphs from images

ABC-Net: Keypoint-Based Molecular Image Recognition

ABC-Net reformulates molecular image recognition as a keypoint detection problem. By predicting atom/bond centers and properties via a single Fully Convolutional Network, it achieves >94% accuracy with high data efficiency.

Optical Chemical Structure Recognition
Overview of the ChemPix CNN-LSTM pipeline converting a hand-drawn hydrocarbon sketch to a SMILES string

ChemPix: Hand-Drawn Hydrocarbon Structure Recognition

Proposes a CNN-LSTM architecture that treats chemical structure recognition as an image captioning task. Introduces a synthetic data generation pipeline with augmentation, degradation, and background addition to train models that generalize to hand-drawn inputs without seeing real data during training.

Optical Chemical Structure Recognition
Architecture diagram showing the DECIMER 1.0 transformer pipeline from chemical image input to SELFIES output

DECIMER 1.0: Transformers for Chemical Image Recognition

DECIMER 1.0 introduces a Transformer-based architecture coupled with EfficientNet-B3 to solve Optical Chemical Structure Recognition. By using the SELFIES representation (which guarantees 100% valid output strings) and scaling training to over 35 million molecules, it achieves 96.47% exact match accuracy on synthetic benchmarks, offering an open-source solution for mining chemical data from legacy literature.

Optical Chemical Structure Recognition
Architecture diagram showing Vision Transformer encoder processing image patches and Transformer decoder generating InChI strings

End-to-End Transformer for Molecular Image Captioning

This paper introduces a convolution-free, end-to-end transformer model for molecular image translation. By replacing CNN encoders with Vision Transformers, it achieves a Levenshtein distance of 6.95 on noisy datasets, compared to 7.49 for ResNet50-LSTM baselines.

Optical Chemical Structure Recognition
Chemical structure diagram representing the ICMDT molecular translation system

ICMDT: Automated Chemical Structure Image Recognition

This paper introduces ICMDT, a Transformer-based architecture for molecular translation (image-to-InChI). By enhancing the TNT block to fuse pixel, small patch, and large patch embeddings, the model achieves superior accuracy on the Bristol-Myers Squibb dataset compared to CNN-RNN and standard Transformer baselines.