Transformers

Bar chart comparing MAT average ROC-AUC against D-MPNN, GCN, and Weave baselines

MAT: Graph-Augmented Transformer for Molecules (2020)

Molecule Attention Transformer (MAT) augments Transformer self-attention with inter-atomic distances and graph adjacency, achieving strong property prediction across diverse molecular tasks with minimal hyperparameter tuning after self-supervised pretraining.

Molecular Representations

Bar chart comparing MG-BERT vs GNN baselines on six MoleculeNet classification tasks

MG-BERT: Graph BERT for Molecular Property Prediction

MG-BERT combines GNN-style local attention with BERT’s masked pretraining on molecular graphs, learning context-sensitive atomic representations that improve ADMET property prediction across 11 benchmark datasets.

Predictive Chemistry

Bar chart showing MTL-BERT combining pretraining, multitask learning, and SMILES enumeration for best improvement

MTL-BERT: Multitask BERT for Property Prediction

MTL-BERT pretrains a BERT model on 1.7M unlabeled SMILES, then fine-tunes jointly on 60 ADMET and molecular property tasks using SMILES enumeration as data augmentation in all phases.

Molecular Generation

Bar chart comparing AlphaDrug docking scores against known ligands across five protein targets

AlphaDrug: MCTS-Guided Target-Specific Drug Design

AlphaDrug generates drug candidates for specific protein targets by combining an Lmser Transformer (with hierarchical encoder-decoder skip connections) and Monte Carlo tree search guided by docking scores, achieving higher binding affinities than known ligands on 86% of test proteins.

Molecular Representations

Bar chart comparing SMILES tokens vs Atom-in-SMILES across molecular generation, retrosynthesis, and reaction prediction

Atom-in-SMILES: Better Tokens for Chemical Models

Introduces Atom-in-SMILES (AIS), a tokenization scheme that encodes local chemical environments into SMILES tokens, improving prediction quality across canonicalization, retrosynthesis, and property prediction tasks.

Molecular Generation

Bar chart showing BindGPT RL achieves best Vina binding scores compared to baselines

BindGPT: GPT for 3D Molecular Design and Docking

BindGPT formulates 3D molecular design as autoregressive text generation over combined SMILES and XYZ tokens, using large-scale pre-training and reinforcement learning to achieve competitive pocket-conditioned molecule generation.

Molecular Generation

Bar chart comparing RNN and GPT architectures with SMILES and Graph representations on desirability scores

DrugEx v3: Scaffold-Constrained Graph Transformer

DrugEx v3 extends scaffold-constrained drug design by introducing a Graph Transformer with adjacency-matrix-based positional encoding, achieving 100% molecular validity and high predicted affinity for adenosine A2A receptor ligands.

Computational Chemistry

Bar chart comparing GPT-3 ada and GNN accuracy across molecular classification tasks

Fine-Tuning GPT-3 for Molecular Property Prediction

This paper fine-tunes GPT-3’s ada model on SMILES strings for classifying electronic properties (HOMO, LUMO) of organic semiconductor molecules, finding competitive accuracy with graph neural networks and exploring robustness through ablation studies.

Computational Chemistry

Bar chart comparing small and big foundation models surveyed across property prediction, MLIPs, inverse design, and multi-domain chemistry applications

Foundation Models in Chemistry: A 2025 Perspective

This perspective from Choi et al. reviews foundation models in chemistry, categorizing them as ‘small’ (domain-specific, e.g., property prediction, MLIPs, inverse design) and ‘big’ (multi-domain, e.g., multimodal and LLM-based). It surveys pretraining strategies, key architectures (GNNs and language models), and outlines future directions for scaling, efficiency, and interpretability.

Molecular Generation

Bar chart showing Lingo3DMol achieves best Vina docking scores on DUD-E compared to five baselines

Lingo3DMol: Language Model for 3D Molecule Design

Lingo3DMol introduces FSMILES, a fragment-based SMILES representation with local and global coordinates, to generate drug-like 3D molecules in protein pockets via a transformer language model.

Predictive Chemistry

Bar chart comparing LLM-Prop band gap MAE against CGCNN, SchNet, MEGNet, and ALIGNN

LLM-Prop: Predicting Crystal Properties from Text

LLM-Prop uses the encoder half of T5, fine-tuned on Robocrystallographer text descriptions, to predict crystal properties. It outperforms GNN baselines like ALIGNN on band gap and volume prediction while using fewer parameters.

Molecular Generation

Bar chart showing language model validity rates across XYZ, CIF, and PDB 3D chemical file formats

LMs Generate 3D Molecules from XYZ, CIF, PDB Files

Demonstrates that standard transformer language models, trained with next-token prediction on sequences from XYZ, CIF, and PDB files, can generate valid 3D molecules, crystals, and protein binding sites competitive with domain-specific 3D generative models.