This group covers sequence-to-sequence models that translate between machine-readable chemical representations and human-readable nomenclature. The core task is SMILES-to-IUPAC translation and its reverse, with extensions to InChI-to-IUPAC conversion and cross-lingual chemical nomenclature translation. These models treat chemical naming as a neural machine translation problem, applying transformer architectures originally developed for natural language.
| Paper | Year | Direction | Key Idea |
|---|---|---|---|
| NMT Nomenclature | 2020 | Cross-lingual | CNN/LSTM models translate chemical names between English and Chinese |
| Transformer Name-to-SMILES | 2020 | Name → SMILES | Transformer predicts chemical structures from compound names |
| HANDSEL | 2021 | InChI → IUPAC | Sequence-to-sequence transformer converts InChI to IUPAC names |
| STOUT | 2021 | SMILES ↔ IUPAC | NMT for bidirectional SMILES-IUPAC translation |
| Struct2IUPAC | 2021 | SMILES → IUPAC | Transformer achieves 98.9% accuracy on chemical notation conversion |
| STOUT V2 | 2024 | SMILES → IUPAC | Transformer trained on 1B molecules for structure-to-name translation |





