Molecular Generation
Bar chart showing PrefixMol Vina scores across different conditioning modes: target, property, combined, and scaffold

PrefixMol: Prefix Embeddings for Drug Molecule Design

PrefixMol prepends learnable condition vectors to a GPT transformer for SMILES generation, enabling joint control over binding pocket targeting and chemical properties like QED, SA, and LogP.

Molecular Generation
Bar chart comparing docking scores of generated vs known ligands for CDK2 and EGFR targets

Protein-to-Drug Molecule Translation via Transformer

Applies the Transformer architecture to generate drug-like molecules conditioned on protein amino acid sequences, treating target-specific de novo drug design as a sequence-to-sequence translation problem.

Molecular Generation
Bar chart comparing PMO benchmark scores with and without chemical quality filters across five generative methods

Re-evaluating Sample Efficiency in Molecule Generation

A critical reassessment of the PMO benchmark for de novo molecule generation, showing that adding molecular weight, LogP, and diversity filters substantially re-ranks generative models, with Augmented Hill-Climb emerging as the top method.

Molecular Generation
Horizontal bar chart showing REINVENT 4 unified framework supporting seven generative model types

REINVENT 4: Open-Source Generative Molecule Design

Overview of REINVENT 4, an open-source generative molecular design framework from AstraZeneca that unifies RNN and transformer generators within reinforcement learning, transfer learning, and curriculum learning optimization algorithms.

Molecular Generation
Bar chart showing deep generative architecture types for molecular design: RNN, VAE, GAN, RL, and hybrid methods

Review: Deep Learning for Molecular Design (2019)

An early and influential review cataloging 45 papers on deep generative modeling for molecules, comparing RNN, VAE, GAN, and reinforcement learning architectures across SMILES and graph-based representations.

Molecular Generation
Bar chart comparing RNN and Transformer Wasserstein distances across drug-like, peptide-like, and polymer-like generation tasks

RNNs vs Transformers for Molecular Generation Tasks

Compares RNN-based and Transformer-based chemical language models across three molecular generation tasks of increasing complexity, finding that RNNs excel at local features while Transformers handle large molecules better.

Molecular Generation
Diagram showing the dual formulation of S4 models with convolution during training and recurrence during generation for SMILES-based molecular design

S4 Structured State Space Models for De Novo Drug Design

This paper introduces structured state space sequence (S4) models to chemical language modeling, showing they combine the strengths of LSTMs (efficient recurrent generation) and GPTs (holistic sequence learning) for de novo molecular design.

Molecular Generation
Bar chart comparing Char-RNN and Molecular VAE on validity and novelty metrics

VAE for Automatic Chemical Design (2018 Seminal)

This foundational paper introduces a variational autoencoder (VAE) that encodes SMILES strings into a continuous latent space, allowing gradient-based optimization of molecular properties. Joint training with a property predictor organizes the latent space by chemical properties, and Bayesian optimization over the latent surface discovers drug-like molecules with improved QED and synthetic accessibility.

Molecular Generation
Diagram showing back translation workflow with forward and reverse models mapping between source and target molecular domains, augmented by unlabeled ZINC molecules

Back Translation for Semi-Supervised Molecule Generation

Adapts back translation from NLP to molecular generation, using unlabeled molecules from ZINC to create synthetic training pairs that improve property optimization and retrosynthesis prediction across Transformer and graph-based architectures.

Molecular Generation
Stylized visualization of protein-ligand docking and benchmark performance bars across five drug targets

DOCKSTRING: Docking-Based Benchmarks for Drug Design

DOCKSTRING bundles an AutoDock Vina wrapper, a 260K-molecule docking dataset across 58 protein targets, and pharmaceutically relevant benchmarks for regression, virtual screening, and de novo design.

Molecular Generation
Diagram showing divergence between optimization score and control scores during molecular optimization

Failure Modes in Molecule Generation & Optimization

Identifies failure modes in molecular generative models, showing that trivial edits fool distribution-learning benchmarks and that ML-based scoring functions introduce exploitable model-specific and data-specific biases during goal-directed optimization.

Molecular Generation
Two Gaussian distributions in ChemNet activation space with the Frechet distance shown between them

Frechet ChemNet Distance for Molecular Generation

Introduces the Frechet ChemNet Distance (FCD), a single metric that captures chemical validity, biological relevance, and diversity of generated molecules by comparing distributions of learned ChemNet representations.