This group covers models that use reinforcement learning to steer molecular generators toward desired property profiles. The common thread is reward-shaped optimization of a base generative policy, whether RNN, Transformer, GAN, or graph-based.

PaperYearBase ModelKey Idea
REINVENT2017RNNAugmented episodic likelihood for goal-directed SMILES generation
ORGAN2017SeqGANRL reward functions for tunable drug-likeness, solubility, and synthesizability
DrugEx v22019RNNPareto ranking and evolutionary exploration for multi-objective design
Link-INVENT2019RNNMolecular linker design with flexible multi-parameter RL scoring
MolecularRNN2019Graph RNNAtom-by-atom generation with valency rejection sampling and policy gradients
Curriculum Learning2019REINVENTSequential task decomposition accelerating convergence on complex objectives
Memory-Assisted RL2020REINVENTScaffold memory penalizes repeated solutions, increasing diversity fourfold
DrugEx v32021Graph TransformerScaffold-constrained generation with 100% validity via adjacency-based encoding
Augmented Hill-Climb2022RNNHybrid RL strategy improving sample efficiency ~45x over REINVENT
REINVENT 42024RNN + TransformerOpen-source framework unifying RL, transfer learning, and curriculum learning

All Notes