Molecular Representations
MoMu architecture showing contrastive alignment between molecular graph and scientific text modalities

MoMu: Bridging Molecular Graphs and Natural Language

MoMu pre-trains dual graph and text encoders on 15K molecule graph-text pairs using contrastive learning, enabling cross-modal retrieval, molecule captioning, zero-shot text-to-graph generation, and improved molecular property prediction.

Molecular Generation
Architecture diagram showing ORGAN generator, discriminator, and objective reward with lambda interpolation formula

ORGAN: Objective-Reinforced GANs for Molecule Design

Proposes ORGAN, a framework that extends SeqGAN with domain-specific reward functions via reinforcement learning, enabling tunable generation of molecules optimized for druglikeness, solubility, and synthesizability while maintaining sample diversity.

Molecular Simulation
Schematic overview of three multi-modal generative model variants for all-atom molecular denoising

PharMolixFM: Multi-Modal All-Atom Molecular Models

PharMolixFM proposes a unified framework for all-atom foundation models using three multi-modal generative approaches (diffusion, flow matching, BFN) and demonstrates competitive docking accuracy with fast inference.

Molecular Generation
REINVENT pipeline showing Prior, Agent, and Scoring Function with augmented likelihood equation

REINVENT: Reinforcement Learning for Mol. Design

Introduces a policy-based reinforcement learning method that fine-tunes an RNN pre-trained on ChEMBL SMILES to generate molecules with specified desirable properties, using an augmented episodic likelihood that anchors the agent to its prior while optimizing a user-defined scoring function.

Molecular Generation
Bar chart comparing AlphaDrug docking scores against known ligands across five protein targets

AlphaDrug: MCTS-Guided Target-Specific Drug Design

AlphaDrug generates drug candidates for specific protein targets by combining an Lmser Transformer (with hierarchical encoder-decoder skip connections) and Monte Carlo tree search guided by docking scores, achieving higher binding affinities than known ligands on 86% of test proteins.

Molecular Generation
Bar chart showing Augmented Hill-Climb achieves up to 45x sample efficiency over REINVENT

Augmented Hill-Climb for RL-Based Molecule Design

Proposes Augmented Hill-Climb, a hybrid RL strategy for SMILES-based generative models that improves sample efficiency ~45-fold over REINVENT by filtering low-scoring molecules from the loss computation, with diversity filters to prevent mode collapse.

Molecular Generation
Two-panel plot showing score divergence with disagreeing classifiers vs convergence with agreeing classifiers

Avoiding Failure Modes in Goal-Directed Generation

Shows that divergence between optimization and control scores during goal-directed molecular generation is explained by pre-existing disagreement among QSAR models on the training distribution, not by algorithmic exploitation of model-specific biases.

Molecular Generation
Bar chart showing BindGPT RL achieves best Vina binding scores compared to baselines

BindGPT: GPT for 3D Molecular Design and Docking

BindGPT formulates 3D molecular design as autoregressive text generation over combined SMILES and XYZ tokens, using large-scale pre-training and reinforcement learning to achieve competitive pocket-conditioned molecule generation.

Molecular Generation
Grouped bar chart showing CLM architectures (RNN, VAE, GAN, Transformer) across generation strategies

Chemical Language Models for De Novo Drug Design Review

A minireview of chemical language models for de novo molecule design, covering SMILES and SELFIES representations, RNN and Transformer architectures, distribution learning, goal-directed and conditional generation, and prospective experimental validation.

Molecular Generation
Bar chart showing CogMol CLaSS enrichment factors across three COVID-19 drug targets

CogMol: Controlled Molecule Generation for COVID-19

CogMol uses a SMILES VAE and multi-attribute controlled sampling (CLaSS) to generate novel, target-specific drug molecules for unseen SARS-CoV-2 proteins without model retraining.

Molecular Generation
Line chart showing curriculum learning converges faster than standard RL for molecular generation

Curriculum Learning for De Novo Drug Design (REINVENT)

Introduces curriculum learning to the REINVENT de novo design platform, decomposing complex drug design objectives into simpler sequential tasks that accelerate agent convergence and improve output quality over standard reinforcement learning.

Molecular Representations
Bar chart comparing SMILES and DeepSMILES error types, showing DeepSMILES eliminates parenthesis errors

DeepSMILES: Adapting SMILES Syntax for Machine Learning

DeepSMILES replaces paired parentheses and ring closure symbols in SMILES with a postfix notation and single ring-size digits, making it easier for generative models to produce syntactically valid molecular strings.