Latent-Space Generation

This group covers models that learn continuous latent representations of molecules and use that space for generation and optimization. The seminal SMILES VAE (Gomez-Bombarelli et al., 2018) established the paradigm: encode molecules into a smooth latent space, then search it via gradient-based or Bayesian optimization.

Paper	Year	Architecture	Key Idea
Grammar VAE	2017	VAE + CFG	Context-free grammar decoder ensuring syntactically valid SMILES
Automatic Chemical Design	2018	VAE	Seminal SMILES VAE enabling Bayesian optimization in latent space
LatentGAN	2019	WGAN + heteroencoder	Wasserstein GAN in heteroencoder latent space, bypassing SMILES syntax
CogMol	2020	VAE + CLaSS	Controlled latent sampling for target-specific design without retraining
PASITHEA	2021	Gradient dreaming	Deep dreaming on SELFIES via property network inversion
LIMO	2022	VAE + property predictor	Stacked property predictor enabling gradient-based search for high-affinity molecules

All Notes

Computational Chemistry

Diagram comparing character-level VAE with low validity to Grammar VAE using parse tree constraints for molecular generation

Grammar VAE: Generating Valid Molecules via CFGs

The Grammar VAE replaces character-level decoding with context-free grammar production rules, using a stack-based masking mechanism to guarantee that all generated SMILES strings are syntactically valid. Applied to molecular optimization and symbolic regression, it learns smoother latent spaces and finds better molecules than character-level baselines.

Computational Chemistry

LatentGAN pipeline from SMILES encoder through latent space WGAN-GP to SMILES decoder

LatentGAN: Latent-Space GAN for Molecular Generation

LatentGAN decouples molecular generation from SMILES syntax by training a Wasserstein GAN on latent vectors from a pretrained heteroencoder, enabling de novo design of drug-like and target-biased compounds.

Computational Chemistry

Bar chart showing CogMol CLaSS enrichment factors across three COVID-19 drug targets

CogMol: Controlled Molecule Generation for COVID-19

CogMol uses a SMILES VAE and multi-attribute controlled sampling (CLaSS) to generate novel, target-specific drug molecules for unseen SARS-CoV-2 proteins without model retraining.

Computational Chemistry

Distribution plot showing original QM9 logP shifted toward +6 and -6 targets via gradient-based dreaming

PASITHEA: Gradient-Based Molecular Design via Dreaming

PASITHEA adapts deep dreaming from computer vision to molecular design, directly optimizing SELFIES-encoded molecules for target chemical properties via gradient-based inversion of a trained regression network.

Computational Chemistry

Bar chart comparing Char-RNN and Molecular VAE on validity and novelty metrics

VAE for Automatic Chemical Design (2018 Seminal)

This foundational paper introduces a variational autoencoder (VAE) that encodes SMILES strings into a continuous latent space, allowing gradient-based optimization of molecular properties. Joint training with a property predictor organizes the latent space by chemical properties, and Bayesian optimization over the latent surface discovers drug-like molecules with improved QED and synthetic accessibility.

Computational Chemistry

Diagram of the LIMO pipeline showing gradient-based reverse optimization flowing backward through a frozen property predictor and VAE decoder to optimize the latent space z

LIMO: Latent Inceptionism for Targeted Molecule Generation

LIMO combines a SELFIES-based VAE with a novel stacked property predictor architecture (decoder output as predictor input) and gradient-based reverse optimization on the latent space. It is 6-8x faster than RL baselines and 12x faster than sampling methods while generating molecules with nanomolar binding affinities, including a predicted KD of 6e-14 M against the human estrogen receptor.

All Notes#

All Notes