Molecular-Representation

Diagram showing molecular structure passing through a neural network to produce IUPAC chemical nomenclature document

STOUT V2.0: Transformer-Based SMILES to IUPAC Translation

STOUT V2.0 uses Transformers trained on ~1 billion SMILES-IUPAC pairs to accurately translate chemical structures into systematic names (and vice-versa), outperforming its RNN predecessor.

Molecular Representations

Vintage wooden device labeled 'The Molecular Interpreter - Model 1974' with vacuum tubes, showing SMILES to IUPAC name translation

STOUT: SMILES to IUPAC Names via Neural Machine Translation

STOUT (SMILES-TO-IUPAC-name translator) uses neural machine translation to convert chemical line notations to IUPAC names and vice versa, achieving ~90% BLEU score. It addresses the lack of open-source tools for algorithmic IUPAC naming.

Molecular Representations

Diagram showing Struct2IUPAC workflow: molecular structure (SMILES) passing through Transformer to generate IUPAC name, with round-trip verification loop

Struct2IUPAC: Translating SMILES to IUPAC via Transformers

This paper proposes a Transformer-based approach (Struct2IUPAC) to convert chemical structures to IUPAC names, challenging the dominance of rule-based systems. Trained on ~47M PubChem examples, it achieves near-perfect accuracy using a round-trip verification step with OPSIN.

Molecular Representations

Transformer encoder-decoder architecture processing InChI string character-by-character to produce IUPAC chemical name

Translating InChI to IUPAC Names with Transformers

This study presents a sequence-to-sequence Transformer model that translates InChI identifiers into IUPAC names character-by-character. Trained on 10 million PubChem pairs, it achieves 91% accuracy on organic compounds, performing comparably to commercial software.

Computational Chemistry

ChemVLM architecture showing molecular structure and text inputs flowing through vision encoder and language model into multimodal LLM for chemical reasoning

ChemVLM: A Multimodal Large Language Model for Chemistry

A 2025 AAAI paper introducing ChemVLM, a domain-specific multimodal LLM (26B parameters). It achieves state-of-the-art performance on chemical OCR, reasoning benchmarks, and molecular understanding tasks by combining vision and language models trained on curated chemistry data.

Optical Chemical Structure Recognition

Diagram of the Image2InChI architecture showing a SwinTransformer encoder connected to an attention-based feature fusion decoder for converting molecular images to InChI strings.

Image2InChI: SwinTransformer for Molecular Recognition

Proposes Image2InChI, an OCSR model with improved SwinTransformer encoder and novel feature fusion network with attention mechanisms that achieves 99.8% InChI accuracy on the BMS dataset.

Optical Chemical Structure Recognition

Architecture diagram of the MarkushGrapher dual-encoder system combining VTL and OCSR encoders for Markush structure recognition.

MarkushGrapher: Multi-modal Markush Structure Recognition

This paper introduces a multi-modal approach for extracting chemical Markush structures from patents, combining a Vision-Text-Layout encoder with a specialized chemical vision encoder. It addresses the lack of training data with a synthetic generation pipeline and introduces M2S, a new real-world benchmark.

Optical Chemical Structure Recognition

Three-stage training pipeline for MolSight showing pretraining, multi-granularity fine-tuning, and RL post-training stages

MolSight: OCSR with RL and Multi-Granularity Learning

MolSight introduces a three-stage training paradigm for Optical Chemical Structure Recognition (OCSR), utilizing large-scale pretraining, multi-granularity fine-tuning with auxiliary bond and coordinate prediction tasks, and reinforcement learning (GRPO) to achieve 85.1% stereochemical accuracy on USPTO, recognizing complex stereochemical structures like chiral centers and cis-trans isomers.

Optical Chemical Structure Recognition

ABC-Net detects atom and bond keypoints to reconstruct molecular graphs from images

ABC-Net: Keypoint-Based Molecular Image Recognition

ABC-Net reformulates molecular image recognition as a keypoint detection problem. By predicting atom/bond centers and properties via a single Fully Convolutional Network, it achieves >94% accuracy with high data efficiency.

Optical Chemical Structure Recognition

Overview of the ChemPix CNN-LSTM pipeline converting a hand-drawn hydrocarbon sketch to a SMILES string

ChemPix: Hand-Drawn Hydrocarbon Structure Recognition

Proposes a CNN-LSTM architecture that treats chemical structure recognition as an image captioning task. Introduces a synthetic data generation pipeline with augmentation, degradation, and background addition to train models that generalize to hand-drawn inputs without seeing real data during training.

Optical Chemical Structure Recognition

Architecture diagram showing the DECIMER 1.0 transformer pipeline from chemical image input to SELFIES output

DECIMER 1.0: Transformers for Chemical Image Recognition

DECIMER 1.0 introduces a Transformer-based architecture coupled with EfficientNet-B3 to solve Optical Chemical Structure Recognition. By using the SELFIES representation (which guarantees 100% valid output strings) and scaling training to over 35 million molecules, it achieves 96.47% exact match accuracy on synthetic benchmarks, offering an open-source solution for mining chemical data from legacy literature.

Optical Chemical Structure Recognition

Architecture diagram showing Vision Transformer encoder processing image patches and Transformer decoder generating InChI strings

End-to-End Transformer for Molecular Image Captioning

This paper introduces a convolution-free, end-to-end transformer model for molecular image translation. By replacing CNN encoders with Vision Transformers, it achieves a Levenshtein distance of 6.95 on noisy datasets, compared to 7.49 for ResNet50-LSTM baselines.