Computational Chemistry
Coscientist architecture with GPT-4 planner orchestrating web search, code execution, document search, and robot lab API modules

Coscientist: Autonomous Chemistry with LLM Agents

Introduces Coscientist, a GPT-4-driven AI system that autonomously designs and executes chemical experiments using web search, code execution, and robotic lab automation.

Computational Chemistry
Diagram comparing character-level VAE with low validity to Grammar VAE using parse tree constraints for molecular generation

Grammar VAE: Generating Valid Molecules via CFGs

The Grammar VAE replaces character-level decoding with context-free grammar production rules, using a stack-based masking mechanism to guarantee that all generated SMILES strings are syntactically valid. Applied to molecular optimization and symbolic regression, it learns smoother latent spaces and finds better molecules than character-level baselines.

Computational Chemistry
Bar chart showing Augmented Hill-Climb achieves up to 45x sample efficiency over REINVENT

Augmented Hill-Climb for RL-Based Molecule Design

Proposes Augmented Hill-Climb, a hybrid RL strategy for SMILES-based generative models that improves sample efficiency ~45-fold over REINVENT by filtering low-scoring molecules from the loss computation, with diversity filters to prevent mode collapse.

Computational Chemistry
Two-panel plot showing score divergence with disagreeing classifiers vs convergence with agreeing classifiers

Avoiding Failure Modes in Goal-Directed Generation

Shows that divergence between optimization and control scores during goal-directed molecular generation is explained by pre-existing disagreement among QSAR models on the training distribution, not by algorithmic exploitation of model-specific biases.

Computational Chemistry
Bar chart showing peak absorption wavelength increasing across evolutionary generations

Evolutionary Molecular Design via Deep Learning + GA

An evolutionary molecular design framework that evolves ECFP fingerprint vectors using a genetic algorithm, reconstructs valid SMILES via an RNN decoder, and evaluates fitness with a DNN property predictor.

Computational Chemistry
Schematic of Link-INVENT architecture showing encoder-decoder RNN with reinforcement learning scoring loop

Link-INVENT: RL-Driven Molecular Linker Generation

Link-INVENT is an RNN-based generative model for molecular linker design that uses reinforcement learning with a flexible scoring function, demonstrated on fragment linking, scaffold hopping, and PROTAC design.

Computational Chemistry
Distribution plot showing original QM9 logP shifted toward +6 and -6 targets via gradient-based dreaming

PASITHEA: Gradient-Based Molecular Design via Dreaming

PASITHEA adapts deep dreaming from computer vision to molecular design, directly optimizing SELFIES-encoded molecules for target chemical properties via gradient-based inversion of a trained regression network.

Computational Chemistry
Bar chart comparing binding affinity scores across SMILES, AIS, and SMI+AIS hybrid tokenization strategies

SMI+AIS: Hybridizing SMILES with Environment Tokens

Proposes SMI+AIS, a hybrid molecular representation combining standard SMILES tokens with chemical-environment-aware Atom-In-SMILES tokens, demonstrating improved molecular generation for drug design targets.

Computational Chemistry
Bar chart comparing Char-RNN and Molecular VAE on validity and novelty metrics

VAE for Automatic Chemical Design (2018 Seminal)

This foundational paper introduces a variational autoencoder (VAE) that encodes SMILES strings into a continuous latent space, allowing gradient-based optimization of molecular properties. Joint training with a property predictor organizes the latent space by chemical properties, and Bayesian optimization over the latent surface discovers drug-like molecules with improved QED and synthetic accessibility.

Computational Chemistry
Stylized visualization of protein-ligand docking and benchmark performance bars across five drug targets

DOCKSTRING: Docking-Based Benchmarks for Drug Design

DOCKSTRING bundles an AutoDock Vina wrapper, a 260K-molecule docking dataset across 58 protein targets, and pharmaceutically relevant benchmarks for regression, virtual screening, and de novo design.

Computational Chemistry
Diagram showing divergence between optimization score and control scores during molecular optimization

Failure Modes in Molecule Generation & Optimization

Identifies failure modes in molecular generative models, showing that trivial edits fool distribution-learning benchmarks and that ML-based scoring functions introduce exploitable model-specific and data-specific biases during goal-directed optimization.

Computational Chemistry
Comparison bar chart showing penalized logP scores for GB-GA, GB-GM-MCTS, and ML-based molecular optimization methods

Graph-Based GA and MCTS Generative Model for Molecules

A graph-based genetic algorithm (GB-GA) and a graph-based generative model with Monte Carlo tree search (GB-GM-MCTS) for molecular optimization that match or outperform ML-based generative approaches while being orders of magnitude faster.