Molecular Representations
Bar chart showing SPMM supports bidirectional tasks: molecule to property, property to molecule, molecule optimization, and property interpolation

SPMM: A Bidirectional Molecular Foundation Model

SPMM pre-trains a dual-stream transformer on SMILES and 53 molecular property vectors using contrastive learning and cross-attention, enabling bidirectional structure-property generation, property prediction, and reaction prediction through a single model.

Molecular Representations
Radial diagram showing 12 transformer architecture families connected to 5 molecular science application domains

Survey of Transformer Architectures in Molecular Science

Jiang et al. survey 12 families of transformer architectures in molecular science, covering GPT, BERT, BART, graph transformers, Transformer-XL, T5, ViT, DETR, Conformer, CLIP, sparse transformers, and mobile/efficient variants, with detailed algorithmic descriptions and molecular applications.

Molecular Representations
Bar chart showing CLM architecture publication trends from 2020 to 2024, with transformers overtaking RNNs

Systematic Review of Deep Learning CLMs (2020-2024)

PRISMA-based systematic review of 72 papers on chemical language models for molecular generation, comparing architectures and biased methods using MOSES metrics.

Molecular Representations
Diagram showing the t-SMILES pipeline from molecular graph fragmentation to binary tree traversal producing a string representation

t-SMILES: Tree-Based Fragment Molecular Encoding

t-SMILES represents molecules by fragmenting them into substructures, building full binary trees, and traversing them breadth-first to produce SMILES-type strings that reduce nesting depth and outperform SMILES, DeepSMILES, and SELFIES on generation benchmarks.

Molecular Representations
Taxonomy of transformer-based chemical language models organized by architecture type

Transformer CLMs for SMILES: Literature Review 2024

A comprehensive review of transformer-based chemical language models operating on SMILES, categorizing encoder-only (BERT variants), decoder-only (GPT variants), and encoder-decoder models with analysis of tokenization strategies, pre-training approaches, and future directions.

Molecular Representations
Diagram showing sequence-to-sequence translation from chemical names to SMILES with atom count constraints

Transformer Name-to-SMILES with Atom Count Losses

This paper applies a Transformer sequence-to-sequence model to predict SMILES strings from chemical compound names (Synonyms). Two enhancements, an atom-count constraint loss and SMILES/InChI multi-task learning, improve F-measure over rule-based and vanilla Transformer baselines.

Molecular Representations
Horizontal bar chart showing X-MOL achieves best performance across five molecular tasks

X-MOL: Pre-training on 1.1B Molecules for SMILES

X-MOL applies large-scale Transformer pre-training on 1.1 billion molecules with a generative SMILES-to-SMILES strategy, then fine-tunes for five molecular analysis tasks including property prediction, reaction analysis, and de novo generation.

Molecular Representations
Bar chart showing retrieval accuracy of chemical language models across four SMILES augmentation types

AMORE: Testing ChemLLM Robustness to SMILES Variants

Introduces AMORE, an embedding-based retrieval framework that evaluates whether chemical language models can recognize the same molecule across different SMILES representations. Results show current models are not robust to identity-preserving augmentations.

Molecular Representations
Taxonomy of molecular representation learning foundation models organized by input modality

Review of Molecular Representation Learning Models

A comprehensive survey classifying molecular representation learning foundation models by input modality (sequence, graph, 3D, image, multimodal) and analyzing four pretraining paradigms for drug discovery tasks.

Molecular Representations
Log-log plots showing power-law scaling of ChemGPT validation loss versus model size and GNN force field loss versus dataset size

Neural Scaling of Deep Chemical Models

Frey et al. discover empirical power-law scaling relations for both chemical language models (ChemGPT, up to 1B parameters) and equivariant GNN interatomic potentials, finding that neither domain has saturated with respect to model size, data, or compute.

Molecular Representations
BARTSmiles ablation study summary showing impact of pre-training strategies on downstream task performance

BARTSmiles: BART Pre-Training for Molecular SMILES

BARTSmiles pre-trains a BART-large model on 1.7 billion SMILES strings from ZINC20 and achieves the best reported results on 11 classification, regression, and generation benchmarks.

Molecular Representations
MoLFormer-XL architecture diagram showing SMILES tokens flowing through a linear attention transformer to MoleculeNet benchmark results and attention-structure correlation

MoLFormer: Large-Scale Chemical Language Representations

MoLFormer is a transformer encoder with linear attention and rotary positional embeddings, pretrained via masked language modeling on 1.1 billion molecules from PubChem and ZINC. MoLFormer-XL outperforms GNN baselines on most MoleculeNet classification and regression tasks, and attention analysis reveals that the model learns interatomic spatial relationships directly from SMILES strings.