Chemical Name Translation

This group covers sequence-to-sequence models that translate between machine-readable chemical representations and human-readable nomenclature. The core task is SMILES-to-IUPAC translation and its reverse, with extensions to InChI-to-IUPAC conversion and cross-lingual chemical nomenclature translation. These models treat chemical naming as a neural machine translation problem, applying transformer architectures originally developed for natural language.

Paper	Year	Direction	Key Idea
NMT Nomenclature	2020	Cross-lingual	CNN/LSTM models translate chemical names between English and Chinese
Transformer Name-to-SMILES	2020	Name → SMILES	Transformer predicts chemical structures from compound names
HANDSEL	2021	InChI → IUPAC	Sequence-to-sequence transformer converts InChI to IUPAC names
STOUT	2021	SMILES ↔ IUPAC	NMT for bidirectional SMILES-IUPAC translation
Struct2IUPAC	2021	SMILES → IUPAC	Transformer achieves 98.9% accuracy on chemical notation conversion
STOUT V2	2024	SMILES → IUPAC	Transformer trained on 1B molecules for structure-to-name translation

All Notes

Computational Chemistry

Encoder-decoder architecture diagram for translating chemical names between English and Chinese with performance comparison bar chart

Neural Machine Translation of Chemical Nomenclature

This paper applies character-level CNN and LSTM encoder-decoder networks to translate chemical names between English and Chinese, comparing them against an existing rule-based tool.

Computational Chemistry

Diagram showing sequence-to-sequence translation from chemical names to SMILES with atom count constraints

Transformer Name-to-SMILES with Atom Count Losses

This paper applies a Transformer sequence-to-sequence model to predict SMILES strings from chemical compound names (Synonyms). Two enhancements, an atom-count constraint loss and SMILES/InChI multi-task learning, improve F-measure over rule-based and vanilla Transformer baselines.

Computational Chemistry

Diagram showing molecular structure passing through a neural network to produce IUPAC chemical nomenclature document

STOUT V2.0: Transformer-Based SMILES to IUPAC Translation

STOUT V2.0 uses Transformers trained on ~1 billion SMILES-IUPAC pairs to accurately translate chemical structures into systematic names (and vice-versa), outperforming its RNN predecessor.

Computational Chemistry

Vintage wooden device labeled 'The Molecular Interpreter - Model 1974' with vacuum tubes, showing SMILES to IUPAC name translation

STOUT: SMILES to IUPAC Names via Neural Machine Translation

STOUT (SMILES-TO-IUPAC-name translator) uses neural machine translation to convert chemical line notations to IUPAC names and vice versa, achieving ~90% BLEU score. It addresses the lack of open-source tools for algorithmic IUPAC naming.

Computational Chemistry

Diagram showing Struct2IUPAC workflow: molecular structure (SMILES) passing through Transformer to generate IUPAC name, with round-trip verification loop

Struct2IUPAC: Translating SMILES to IUPAC via Transformers

This paper proposes a Transformer-based approach (Struct2IUPAC) to convert chemical structures to IUPAC names, challenging the dominance of rule-based systems. Trained on ~47M PubChem examples, it achieves near-perfect accuracy using a round-trip verification step with OPSIN.

Computational Chemistry

Transformer encoder-decoder architecture processing InChI string character-by-character to produce IUPAC chemical name

Translating InChI to IUPAC Names with Transformers

This study presents a sequence-to-sequence Transformer model that translates InChI identifiers into IUPAC names character-by-character. Trained on 10 million PubChem pairs, it achieves 91% accuracy on organic compounds, performing comparably to commercial software.

All Notes#

All Notes