REINVENT 4: Open-Source Generative Molecule Design

An Open-Source Reference Implementation for Generative Molecular Design

REINVENT 4 is a Resource paper presenting a production-grade, open-source software framework for AI-driven generative molecular design. The primary contribution is the unified codebase that integrates four distinct molecule generators (de novo, scaffold decoration, linker design, molecular optimization) within three machine learning optimization algorithms (transfer learning, reinforcement learning, curriculum learning). The software is released under the Apache 2.0 license and represents the fourth major version of the REINVENT platform, which has been in continuous production use at AstraZeneca for drug discovery.

Bridging the Gap Between Research Prototypes and Production Molecular Design

The motivation for REINVENT 4 stems from several gaps in the generative molecular design landscape. While numerous AI model architectures have been developed for molecular generation (VAEs, GANs, RNNs, transformers, flow models, diffusion models), most exist as research prototypes released alongside individual publications rather than as maintained, integrated software. The authors argue that the scientific community needs reference implementations of common generative molecular design algorithms in the public domain to:

Enable nuanced debate about the application of AI in drug discovery
Serve as educational tools for practitioners entering the field
Increase transparency around AI-driven molecular design
Provide a foundation for future innovation

REINVENT 4 consolidates previously separate codebases (REINVENT v1, v2, LibInvent, LinkInvent, Mol2Mol) into a single repository with a consistent interface, addressing the fragmentation that characterized earlier releases.

Unified Framework for Sequence-Based Molecular Generation

The core design of REINVENT 4 centers on sequence-based neural network models that generate SMILES strings in an autoregressive manner. All generators model the probability of producing a token sequence, with two formulations.

For unconditional agents (de novo generation), the joint probability of a sequence $T$ with tokens $t_1, t_2, \ldots, t_\ell$ is:

$$ \mathbf{P}(T) = \prod_{i=1}^{\ell} \mathbf{P}(t_i \mid t_{i-1}, t_{i-2}, \ldots, t_1) $$

For conditional agents (scaffold decoration, linker design, molecular optimization), the joint probability given an input sequence $S$ is:

$$ \mathbf{P}(T \mid S) = \prod_{i=1}^{\ell} \mathbf{P}(t_i \mid t_{i-1}, t_{i-2}, \ldots, t_1, S) $$

The negative log-likelihood for unconditional agents is:

$$ NLL(T) = -\log \mathbf{P}(T) = -\sum_{i=1}^{\ell} \log \mathbf{P}(t_i \mid t_{i-1}, t_{i-2}, \ldots, t_1) $$

Reinforcement Learning with DAP

The key optimization mechanism is reinforcement learning via the “Difference between Augmented and Posterior” (DAP) strategy. For each generated sequence $T$, the augmented likelihood is defined as:

$$ \log \mathbf{P}_{\text{aug}}(T) = \log \mathbf{P}_{\text{prior}}(T) + \sigma \mathbf{S}(T) $$

where $\mathbf{S}(T) \in [0, 1]$ is the scalar score and $\sigma \geq 0$ controls the balance between reward and regularization. The DAP loss is:

$$ \mathcal{L}(T) = \left(\log \mathbf{P}_{\text{aug}}(T) - \log \mathbf{P}_{\text{agent}}(T)\right)^2 $$

The presence of the prior likelihood in the augmented likelihood constrains how far the agent can deviate from chemically plausible space, functioning similarly to proximal policy gradient methods. The loss is lower-bounded by:

$$ \mathcal{L}(T) \geq \max\left(0, \log \mathbf{P}_{\text{prior}}(T) + \sigma \mathbf{S}(T)\right)^2 $$

Four Molecule Generators

REINVENT 4 supports four generator types:

Generator	Architecture	Input	Task
Reinvent	RNN	None	De novo design from scratch
LibInvent	RNN	Scaffold SMILES	R-group replacement, library design
LinkInvent	RNN	Two warhead fragments	Linker design, scaffold hopping
Mol2Mol	Transformer	Input molecule	Molecular optimization within similarity bounds

All generators are fully integrated with all three optimization algorithms (TL, RL, CL). The Mol2Mol transformer was trained on over 200 billion molecular pairs from PubChem with Tanimoto similarity $\geq 0.50$, using ranking loss to directly link negative log-likelihood to molecular similarity.

Staged Learning (Curriculum Learning)

A key new feature is staged learning, which implements curriculum learning as multi-stage RL. Each stage can define a different scoring profile, allowing users to gradually phase in computationally expensive scoring functions. For example, cheap drug-likeness filters can run first, followed by docking in later stages. Stages terminate when a maximum score threshold is exceeded or a step limit is reached.

Scoring Subsystem

The scoring subsystem implements a plugin architecture supporting over 25 scoring components, including:

Physicochemical descriptors from RDKit (QED, SLogP, TPSA, molecular weight, etc.)
Molecular docking via DockStream (AutoDock Vina, rDock, Hybrid, Glide, GOLD)
QSAR models via Qptuna and ChemProp (D-MPNN)
Shape similarity via ROCS
Synthesizability estimation via SA score
Matched molecular pairs via mmpdb
Generic REST and external process interfaces

Scores are aggregated via weighted arithmetic or geometric mean. A transform system (sigmoid, step functions, value maps) normalizes individual component scores to $[0, 1]$.

PDK1 Inhibitor Case Study

The paper demonstrates REINVENT 4 through a structure-based drug design exercise targeting Phosphoinositide-dependent kinase-1 (PDK1) inhibitors. The experimental setup uses PDB crystal structure 2XCH with DockStream and Glide for docking, defining hits as molecules with docking score $\leq -8$ kcal/mol and QED $\geq 0.7$.

Baseline RL from prior: 50 epochs of staged learning with batch size 128 produced 119 hits from 6,400 generated molecules (1.9% hit rate), spread across 103 generic Bemis-Murcko scaffolds.

Transfer learning + RL: After 10 epochs of TL on 315 congeneric pyridinone PDK1 actives from PubChem Assay AID1798002, the same 50-epoch RL run produced 222 hits (3.5% hit rate) across 176 unique generic scaffolds, nearly doubling productivity.

Both approaches generated top-scoring molecules (docking score of -10.1 kcal/mol each) with plausible binding poses reproducing key protein-ligand interactions seen in the native crystal structure, including hinge interactions with ALA 162 and contacts with LYS 111.

The paper also demonstrates the agent’s plasticity through a molecular weight switching experiment: after 500 epochs driving generation toward 1500 Da molecules, switching the reward to favor molecules $\leq 500$ Da resulted in rapid adaptation within ~50 epochs, showing that the RL agent can recover from extreme biases.

Practical Software for AI-Driven Drug Discovery

REINVENT 4 represents a mature, well-documented framework that consolidates years of incremental development into a single codebase. Key practical features include TOML/JSON configuration, TensorBoard visualization, multinomial sampling and beam search decoding, diversity filters for scaffold-level novelty, experience replay (inception), and a plugin mechanism for extending the scoring subsystem.

The authors acknowledge that this is one approach among many and that there is no single solution that uniformly outperforms others. REINVENT has demonstrated strong sample efficiency in benchmarks and produced realistic 3D docking poses, but the paper does not claim universal superiority. The focus is on providing a well-engineered, transparent reference implementation rather than advancing a novel algorithm.

Limitations include that only the Mol2Mol prior supports stereochemistry, the training data biases constrain the explorable chemical space, and the SMILES-based representation inherits the known fragility of string-based molecular encodings.

Reproducibility Details

Data

Purpose	Dataset	Size	Notes
Prior training (Reinvent)	ChEMBL 25	~1.7M molecules	Drug-like compounds
Prior training (LibInvent)	ChEMBL 27	~1.9M molecules	Scaffold-decoration pairs
Prior training (LinkInvent)	ChEMBL 27	~1.9M molecules	Fragment-linker pairs
Prior training (Mol2Mol)	ChEMBL 28 / PubChem	~200B pairs	Tanimoto similarity $\geq 0.50$
Case study TL	PubChem AID1798002	315 compounds	Congeneric PDK1 actives
Case study docking	PDB 2XCH	1 structure	PDK1 crystal structure

Algorithms

Optimization: DAP (recommended), plus three deprecated alternatives (REINFORCE, A2C, MAULI)
Decoding: Multinomial sampling (default, temperature $K = 1$) and beam search
Diversity filter: Murcko scaffold, topological scaffold, scaffold similarity, same-SMILES penalty
Experience replay: Inception memory with configurable size and sampling rate
Gradient descent: Adam optimizer

Models

All pre-trained priors are distributed with the repository. RNN-based generators (Reinvent, LibInvent, LinkInvent) and transformer-based generator (Mol2Mol) with multiple similarity-conditioned variants.

Evaluation

Metric	Value	Condition	Notes
Hit rate (RL)	1.9%	50 epochs, batch 128	PDK1 case study
Hit rate (TL+RL)	3.5%	10 TL + 50 RL epochs	PDK1 case study
Scaffold diversity (RL)	103 scaffolds	From 119 hits	Generic Bemis-Murcko
Scaffold diversity (TL+RL)	176 scaffolds	From 222 hits	Generic Bemis-Murcko
Best docking score	-10.1 kcal/mol	Both methods	Glide SP

Hardware

The paper does not specify hardware requirements. REINVENT 4 supports both GPU and CPU execution. Python 3.10+ is required, with PyTorch 1.x (2.0 also compatible) and RDKit 2022.9+.

Artifacts

Artifact	Type	License	Notes
REINVENT4	Code	Apache-2.0	Full framework with pre-trained priors
DockStream	Code	Apache-2.0	Docking wrapper for scoring

Paper Information

Citation: Loeffler, H. H., He, J., Tibo, A., Janet, J. P., Voronov, A., Mervin, L. H., & Engkvist, O. (2024). Reinvent 4: Modern AI-driven generative molecule design. Journal of Cheminformatics, 16, 20. https://doi.org/10.1186/s13321-024-00812-5

@article{loeffler2024reinvent,
  title={Reinvent 4: Modern AI-driven generative molecule design},
  author={Loeffler, Hannes H. and He, Jiazhen and Tibo, Alessandro and Janet, Jon Paul and Voronov, Alexey and Mervin, Lewis H. and Engkvist, Ola},
  journal={Journal of Cheminformatics},
  volume={16},
  number={1},
  pages={20},
  year={2024},
  publisher={Springer},
  doi={10.1186/s13321-024-00812-5}
}

An Open-Source Reference Implementation for Generative Molecular Design#

Bridging the Gap Between Research Prototypes and Production Molecular Design#

Unified Framework for Sequence-Based Molecular Generation#

Reinforcement Learning with DAP#

Four Molecule Generators#

Staged Learning (Curriculum Learning)#

Scoring Subsystem#

PDK1 Inhibitor Case Study#

Practical Software for AI-Driven Drug Discovery#

Reproducibility Details#

Data#

Algorithms#

Models#

Evaluation#

Hardware#

Artifacts#

Paper Information#