An Open-Source Reference Implementation for Generative Molecular Design
REINVENT 4 is a Resource paper presenting a production-grade, open-source software framework for AI-driven generative molecular design. The primary contribution is the unified codebase that integrates four distinct molecule generators (de novo, scaffold decoration, linker design, molecular optimization) within three machine learning optimization algorithms (transfer learning, reinforcement learning, curriculum learning). The software is released under the Apache 2.0 license and represents the fourth major version of the REINVENT platform, which has been in continuous production use at AstraZeneca for drug discovery.
Bridging the Gap Between Research Prototypes and Production Molecular Design
The motivation for REINVENT 4 stems from several gaps in the generative molecular design landscape. While numerous AI model architectures have been developed for molecular generation (VAEs, GANs, RNNs, transformers, flow models, diffusion models), most exist as research prototypes released alongside individual publications rather than as maintained, integrated software. The authors argue that the scientific community needs reference implementations of common generative molecular design algorithms in the public domain to:
- Enable nuanced debate about the application of AI in drug discovery
- Serve as educational tools for practitioners entering the field
- Increase transparency around AI-driven molecular design
- Provide a foundation for future innovation
REINVENT 4 consolidates previously separate codebases (REINVENT v1, v2, LibInvent, LinkInvent, Mol2Mol) into a single repository with a consistent interface, addressing the fragmentation that characterized earlier releases.
Unified Framework for Sequence-Based Molecular Generation
The core design of REINVENT 4 centers on sequence-based neural network models that generate SMILES strings in an autoregressive manner. All generators model the probability of producing a token sequence, with two formulations.
For unconditional agents (de novo generation), the joint probability of a sequence $T$ with tokens $t_1, t_2, \ldots, t_\ell$ is:
$$ \mathbf{P}(T) = \prod_{i=1}^{\ell} \mathbf{P}(t_i \mid t_{i-1}, t_{i-2}, \ldots, t_1) $$
For conditional agents (scaffold decoration, linker design, molecular optimization), the joint probability given an input sequence $S$ is:
$$ \mathbf{P}(T \mid S) = \prod_{i=1}^{\ell} \mathbf{P}(t_i \mid t_{i-1}, t_{i-2}, \ldots, t_1, S) $$
The negative log-likelihood for unconditional agents is:
$$ NLL(T) = -\log \mathbf{P}(T) = -\sum_{i=1}^{\ell} \log \mathbf{P}(t_i \mid t_{i-1}, t_{i-2}, \ldots, t_1) $$
Reinforcement Learning with DAP
The key optimization mechanism is reinforcement learning via the “Difference between Augmented and Posterior” (DAP) strategy. For each generated sequence $T$, the augmented likelihood is defined as:
$$ \log \mathbf{P}_{\text{aug}}(T) = \log \mathbf{P}_{\text{prior}}(T) + \sigma \mathbf{S}(T) $$
where $\mathbf{S}(T) \in [0, 1]$ is the scalar score and $\sigma \geq 0$ controls the balance between reward and regularization. The DAP loss is:
$$ \mathcal{L}(T) = \left(\log \mathbf{P}_{\text{aug}}(T) - \log \mathbf{P}_{\text{agent}}(T)\right)^2 $$
The presence of the prior likelihood in the augmented likelihood constrains how far the agent can deviate from chemically plausible space, functioning similarly to proximal policy gradient methods. The loss is lower-bounded by:
$$ \mathcal{L}(T) \geq \max\left(0, \log \mathbf{P}_{\text{prior}}(T) + \sigma \mathbf{S}(T)\right)^2 $$
Four Molecule Generators
REINVENT 4 supports four generator types:
| Generator | Architecture | Input | Task |
|---|---|---|---|
| Reinvent | RNN | None | De novo design from scratch |
| LibInvent | RNN | Scaffold SMILES | R-group replacement, library design |
| LinkInvent | RNN | Two warhead fragments | Linker design, scaffold hopping |
| Mol2Mol | Transformer | Input molecule | Molecular optimization within similarity bounds |
All generators are fully integrated with all three optimization algorithms (TL, RL, CL). The Mol2Mol transformer was trained on over 200 billion molecular pairs from PubChem with Tanimoto similarity $\geq 0.50$, using ranking loss to directly link negative log-likelihood to molecular similarity.
Staged Learning (Curriculum Learning)
A key new feature is staged learning, which implements curriculum learning as multi-stage RL. Each stage can define a different scoring profile, allowing users to gradually phase in computationally expensive scoring functions. For example, cheap drug-likeness filters can run first, followed by docking in later stages. Stages terminate when a maximum score threshold is exceeded or a step limit is reached.
Scoring Subsystem
The scoring subsystem implements a plugin architecture supporting over 25 scoring components, including:
- Physicochemical descriptors from RDKit (QED, SLogP, TPSA, molecular weight, etc.)
- Molecular docking via DockStream (AutoDock Vina, rDock, Hybrid, Glide, GOLD)
- QSAR models via Qptuna and ChemProp (D-MPNN)
- Shape similarity via ROCS
- Synthesizability estimation via SA score
- Matched molecular pairs via mmpdb
- Generic REST and external process interfaces
Scores are aggregated via weighted arithmetic or geometric mean. A transform system (sigmoid, step functions, value maps) normalizes individual component scores to $[0, 1]$.
PDK1 Inhibitor Case Study
The paper demonstrates REINVENT 4 through a structure-based drug design exercise targeting Phosphoinositide-dependent kinase-1 (PDK1) inhibitors. The experimental setup uses PDB crystal structure 2XCH with DockStream and Glide for docking, defining hits as molecules with docking score $\leq -8$ kcal/mol and QED $\geq 0.7$.
Baseline RL from prior: 50 epochs of staged learning with batch size 128 produced 119 hits from 6,400 generated molecules (1.9% hit rate), spread across 103 generic Bemis-Murcko scaffolds.
Transfer learning + RL: After 10 epochs of TL on 315 congeneric pyridinone PDK1 actives from PubChem Assay AID1798002, the same 50-epoch RL run produced 222 hits (3.5% hit rate) across 176 unique generic scaffolds, nearly doubling productivity.
Both approaches generated top-scoring molecules (docking score of -10.1 kcal/mol each) with plausible binding poses reproducing key protein-ligand interactions seen in the native crystal structure, including hinge interactions with ALA 162 and contacts with LYS 111.
The paper also demonstrates the agent’s plasticity through a molecular weight switching experiment: after 500 epochs driving generation toward 1500 Da molecules, switching the reward to favor molecules $\leq 500$ Da resulted in rapid adaptation within ~50 epochs, showing that the RL agent can recover from extreme biases.
Practical Software for AI-Driven Drug Discovery
REINVENT 4 represents a mature, well-documented framework that consolidates years of incremental development into a single codebase. Key practical features include TOML/JSON configuration, TensorBoard visualization, multinomial sampling and beam search decoding, diversity filters for scaffold-level novelty, experience replay (inception), and a plugin mechanism for extending the scoring subsystem.
The authors acknowledge that this is one approach among many and that there is no single solution that uniformly outperforms others. REINVENT has demonstrated strong sample efficiency in benchmarks and produced realistic 3D docking poses, but the paper does not claim universal superiority. The focus is on providing a well-engineered, transparent reference implementation rather than advancing a novel algorithm.
Limitations include that only the Mol2Mol prior supports stereochemistry, the training data biases constrain the explorable chemical space, and the SMILES-based representation inherits the known fragility of string-based molecular encodings.
Reproducibility Details
Data
| Purpose | Dataset | Size | Notes |
|---|---|---|---|
| Prior training (Reinvent) | ChEMBL 25 | ~1.7M molecules | Drug-like compounds |
| Prior training (LibInvent) | ChEMBL 27 | ~1.9M molecules | Scaffold-decoration pairs |
| Prior training (LinkInvent) | ChEMBL 27 | ~1.9M molecules | Fragment-linker pairs |
| Prior training (Mol2Mol) | ChEMBL 28 / PubChem | ~200B pairs | Tanimoto similarity $\geq 0.50$ |
| Case study TL | PubChem AID1798002 | 315 compounds | Congeneric PDK1 actives |
| Case study docking | PDB 2XCH | 1 structure | PDK1 crystal structure |
Algorithms
- Optimization: DAP (recommended), plus three deprecated alternatives (REINFORCE, A2C, MAULI)
- Decoding: Multinomial sampling (default, temperature $K = 1$) and beam search
- Diversity filter: Murcko scaffold, topological scaffold, scaffold similarity, same-SMILES penalty
- Experience replay: Inception memory with configurable size and sampling rate
- Gradient descent: Adam optimizer
Models
All pre-trained priors are distributed with the repository. RNN-based generators (Reinvent, LibInvent, LinkInvent) and transformer-based generator (Mol2Mol) with multiple similarity-conditioned variants.
Evaluation
| Metric | Value | Condition | Notes |
|---|---|---|---|
| Hit rate (RL) | 1.9% | 50 epochs, batch 128 | PDK1 case study |
| Hit rate (TL+RL) | 3.5% | 10 TL + 50 RL epochs | PDK1 case study |
| Scaffold diversity (RL) | 103 scaffolds | From 119 hits | Generic Bemis-Murcko |
| Scaffold diversity (TL+RL) | 176 scaffolds | From 222 hits | Generic Bemis-Murcko |
| Best docking score | -10.1 kcal/mol | Both methods | Glide SP |
Hardware
The paper does not specify hardware requirements. REINVENT 4 supports both GPU and CPU execution. Python 3.10+ is required, with PyTorch 1.x (2.0 also compatible) and RDKit 2022.9+.
Artifacts
| Artifact | Type | License | Notes |
|---|---|---|---|
| REINVENT4 | Code | Apache-2.0 | Full framework with pre-trained priors |
| DockStream | Code | Apache-2.0 | Docking wrapper for scoring |
Paper Information
Citation: Loeffler, H. H., He, J., Tibo, A., Janet, J. P., Voronov, A., Mervin, L. H., & Engkvist, O. (2024). Reinvent 4: Modern AI-driven generative molecule design. Journal of Cheminformatics, 16, 20. https://doi.org/10.1186/s13321-024-00812-5
@article{loeffler2024reinvent,
title={Reinvent 4: Modern AI-driven generative molecule design},
author={Loeffler, Hannes H. and He, Jiazhen and Tibo, Alessandro and Janet, Jon Paul and Voronov, Alexey and Mervin, Lewis H. and Engkvist, Ola},
journal={Journal of Cheminformatics},
volume={16},
number={1},
pages={20},
year={2024},
publisher={Springer},
doi={10.1186/s13321-024-00812-5}
}
