Molecular Generation

This section covers models that generate novel molecular structures. Notes are organized into subsections by approach:

Autoregressive Generation covers models that produce SMILES tokens sequentially, from early RNNs and LSTMs through pre-trained transformers (Chemformer, GP-MoLFormer), state-space models (S4), and semi-supervised methods.
RL-Tuned Generation covers reinforcement learning pipelines that optimize generative policies toward multi-parameter property objectives (REINVENT, DrugEx, Link-INVENT, ORGAN).
Target-Aware Generation covers models conditioned on protein targets, binding pockets, or 3D structural constraints for structure-based drug design.
Latent-Space Generation covers VAEs and gradient-based optimization in continuous molecular latent spaces (seminal VAE, Grammar VAE, LIMO, LatentGAN).
Search-Based Generation covers genetic algorithms and training-free mutation strategies (STONED) that serve as baselines and alternatives to learned generative models.
Evaluation, Benchmarks & Surveys covers benchmark suites (GuacaMol, MOSES, PMO), scoring frameworks (MolScore, FCD), docking benchmarks, failure analysis, and surveys of the molecular generation field.

Autoregressive Generation

Autoregressive molecular generators including pre-trained transformers, RNNs, state-space models, and semi-supervised methods that generate SMILES token-by-token.

Latent-Space Generation

Variational autoencoders and latent-space optimization methods for molecular generation, including grammar-constrained decoding, gradient-based design, and GAN-based latent sampling.

RL-Tuned Generation

Reinforcement learning approaches for goal-directed molecular generation, including policy gradient methods, multi-objective optimization, scaffold decoration, and diversity-promoting strategies.

Search-Based Generation

Genetic algorithms and string mutation methods for molecular generation, often used as baselines for deep learning approaches.

Evaluation, Benchmarks & Surveys

Benchmark suites, scoring metrics, docking benchmarks, surveys, and failure analysis for molecular generation.

Target-Aware Generation

Molecular generation models conditioned on protein targets, binding pockets, or 3D structural constraints for structure-based drug design.