Molecular Representations

How a molecule is encoded determines what a model can learn from it. This section covers the full spectrum of molecular representations: from string notations and identifiers, through pre-trained neural encoders, to cross-modal models and translation systems.

Notations – String formats (SMILES, InChI, SELFIES), tokenization, and representation surveys
Encoders – Pre-trained molecular encoders (ChemBERTa, MoLFormer, etc.)
Name Translation – Translating between SMILES and IUPAC names
Multimodal – Models fusing molecules with text, protein, or knowledge graph data

Molecular Notations

Notes on molecular string notations: SMILES, InChI, SELFIES, tokenization schemes, and representation surveys.

Multimodal Molecular Models

Pre-trained models that fuse molecular strings or graphs with auxiliary modalities like natural language, property vectors, protein sequences, or knowledge graphs.

Molecular Encoders

Pre-trained models that learn molecular representations from chemical string notations (SMILES, SELFIES, InChI), including masked language models, autoencoders, and embedding methods for downstream property prediction and retrieval.

Chemical Name Translation

Neural machine translation models for converting between chemical string representations (SMILES, InChI) and human-readable names (IUPAC, common names).