This group covers sequence-to-sequence models that translate between machine-readable chemical representations and human-readable nomenclature. The core task is SMILES-to-IUPAC translation and its reverse, with extensions to InChI-to-IUPAC conversion and cross-lingual chemical nomenclature translation. These models treat chemical naming as a neural machine translation problem, applying transformer architectures originally developed for natural language.

PaperYearDirectionKey Idea
NMT Nomenclature2020Cross-lingualCNN/LSTM models translate chemical names between English and Chinese
Transformer Name-to-SMILES2020Name → SMILESTransformer predicts chemical structures from compound names
HANDSEL2021InChI → IUPACSequence-to-sequence transformer converts InChI to IUPAC names
STOUT2021SMILES ↔ IUPACNMT for bidirectional SMILES-IUPAC translation
Struct2IUPAC2021SMILES → IUPACTransformer achieves 98.9% accuracy on chemical notation conversion
STOUT V22024SMILES → IUPACTransformer trained on 1B molecules for structure-to-name translation

All Notes