Molecular Generation

ChemGE pipeline from integer chromosome through CFG grammar rules to valid SMILES output

ChemGE: Molecule Generation via Grammatical Evolution

ChemGE uses grammatical evolution over SMILES context-free grammars to generate diverse drug-like molecules in parallel, outperforming deep learning baselines in throughput and molecular diversity.

Molecular Generation

Pareto front plot for multi-objective optimization alongside DrugEx v2 explorer-exploiter architecture

DrugEx v2: Pareto Multi-Objective RL for Drug Design

DrugEx v2 introduces Pareto-based multi-objective optimization and evolutionary exploration strategies into an RNN reinforcement learning framework for de novo drug design toward multiple protein targets.

Molecular Generation

Grammar VAE: Generating Valid Molecules via CFGs

The Grammar VAE replaces character-level decoding with context-free grammar production rules, using a stack-based masking mechanism to guarantee that all generated SMILES strings are syntactically valid. Applied to molecular optimization and symbolic regression, it learns smoother latent spaces and finds better molecules than character-level baselines.

Molecular Generation

LatentGAN pipeline from SMILES encoder through latent space WGAN-GP to SMILES decoder

LatentGAN: Latent-Space GAN for Molecular Generation

LatentGAN decouples molecular generation from SMILES syntax by training a Wasserstein GAN on latent vectors from a pretrained heteroencoder, enabling de novo design of drug-like and target-biased compounds.

Molecular Generation

LSTM cells generating SMILES characters alongside validity and novelty statistics for drug-like molecule generation

LSTM Neural Network for Drug-Like Molecule Generation

Ertl et al. train a character-level LSTM on 509K bioactive ChEMBL SMILES and generate one million novel, diverse molecules whose physicochemical properties, substructure features, and predicted bioactivity closely match the training distribution.

Molecular Generation

Diagram showing how memory-assisted reinforcement learning explores multiple local maxima in chemical space compared to standard RL

Memory-Assisted RL for Diverse De Novo Mol. Design

Introduces a memory unit that modifies the RL reward function to penalize previously explored chemical scaffolds, substantially increasing the diversity of generated molecules while maintaining relevance to known active ligands.

Molecular Generation

Molecular graph being built atom-by-atom with BFS ordering and property optimization bars

MolecularRNN: Graph-Based Molecular Generation and RL

Proposes MolecularRNN, a graph recurrent model that generates molecular graphs atom-by-atom with 100% validity via valency-based rejection sampling, then shifts property distributions using policy gradient reinforcement learning.

Molecular Generation

Architecture diagram showing ORGAN generator, discriminator, and objective reward with lambda interpolation formula

ORGAN: Objective-Reinforced GANs for Molecule Design

Proposes ORGAN, a framework that extends SeqGAN with domain-specific reward functions via reinforcement learning, enabling tunable generation of molecules optimized for druglikeness, solubility, and synthesizability while maintaining sample diversity.

Molecular Generation

REINVENT pipeline showing Prior, Agent, and Scoring Function with augmented likelihood equation

REINVENT: Reinforcement Learning for Mol. Design

Introduces a policy-based reinforcement learning method that fine-tunes an RNN pre-trained on ChEMBL SMILES to generate molecules with specified desirable properties, using an augmented episodic likelihood that anchors the agent to its prior while optimizing a user-defined scoring function.

Molecular Generation

Bar chart comparing AlphaDrug docking scores against known ligands across five protein targets

AlphaDrug: MCTS-Guided Target-Specific Drug Design

AlphaDrug generates drug candidates for specific protein targets by combining an Lmser Transformer (with hierarchical encoder-decoder skip connections) and Monte Carlo tree search guided by docking scores, achieving higher binding affinities than known ligands on 86% of test proteins.

Molecular Generation

Bar chart showing Augmented Hill-Climb achieves up to 45x sample efficiency over REINVENT

Augmented Hill-Climb for RL-Based Molecule Design

Proposes Augmented Hill-Climb, a hybrid RL strategy for SMILES-based generative models that improves sample efficiency ~45-fold over REINVENT by filtering low-scoring molecules from the loss computation, with diversity filters to prevent mode collapse.

Molecular Generation

Two-panel plot showing score divergence with disagreeing classifiers vs convergence with agreeing classifiers

Avoiding Failure Modes in Goal-Directed Generation

Shows that divergence between optimization and control scores during goal-directed molecular generation is explained by pre-existing disagreement among QSAR models on the training distribution, not by algorithmic exploitation of model-specific biases.