How a molecule is encoded determines what a model can learn from it. This section covers the full spectrum of molecular representations: from string notations and identifiers, through pre-trained neural encoders, to cross-modal models and translation systems.
- Notations – String formats (SMILES, InChI, SELFIES), tokenization, and representation surveys
- Encoders – Pre-trained molecular encoders (ChemBERTa, MoLFormer, etc.)
- Name Translation – Translating between SMILES and IUPAC names
- Multimodal – Models fusing molecules with text, protein, or knowledge graph data