An Open-Source Reference Implementation for Generative Molecular Design

REINVENT 4 is a Resource paper presenting a production-grade, open-source software framework for AI-driven generative molecular design. The primary contribution is the unified codebase that integrates four distinct molecule generators (de novo, scaffold decoration, linker design, molecular optimization) within three machine learning optimization algorithms (transfer learning, reinforcement learning, curriculum learning). The software is released under the Apache 2.0 license and represents the fourth major version of the REINVENT platform, which has been in continuous production use at AstraZeneca for drug discovery.

Bridging the Gap Between Research Prototypes and Production Molecular Design

The motivation for REINVENT 4 stems from several gaps in the generative molecular design landscape. While numerous AI model architectures have been developed for molecular generation (VAEs, GANs, RNNs, transformers, flow models, diffusion models), most exist as research prototypes released alongside individual publications rather than as maintained, integrated software. The authors argue that the scientific community needs reference implementations of common generative molecular design algorithms in the public domain to:

  1. Enable nuanced debate about the application of AI in drug discovery
  2. Serve as educational tools for practitioners entering the field
  3. Increase transparency around AI-driven molecular design
  4. Provide a foundation for future innovation

REINVENT 4 consolidates previously separate codebases (REINVENT v1, v2, LibInvent, LinkInvent, Mol2Mol) into a single repository with a consistent interface, addressing the fragmentation that characterized earlier releases.

Unified Framework for Sequence-Based Molecular Generation

The core design of REINVENT 4 centers on sequence-based neural network models that generate SMILES strings in an autoregressive manner. All generators model the probability of producing a token sequence, with two formulations.

For unconditional agents (de novo generation), the joint probability of a sequence $T$ with tokens $t_1, t_2, \ldots, t_\ell$ is:

$$ \mathbf{P}(T) = \prod_{i=1}^{\ell} \mathbf{P}(t_i \mid t_{i-1}, t_{i-2}, \ldots, t_1) $$

For conditional agents (scaffold decoration, linker design, molecular optimization), the joint probability given an input sequence $S$ is:

$$ \mathbf{P}(T \mid S) = \prod_{i=1}^{\ell} \mathbf{P}(t_i \mid t_{i-1}, t_{i-2}, \ldots, t_1, S) $$

The negative log-likelihood for unconditional agents is:

$$ NLL(T) = -\log \mathbf{P}(T) = -\sum_{i=1}^{\ell} \log \mathbf{P}(t_i \mid t_{i-1}, t_{i-2}, \ldots, t_1) $$

Reinforcement Learning with DAP

The key optimization mechanism is reinforcement learning via the “Difference between Augmented and Posterior” (DAP) strategy. For each generated sequence $T$, the augmented likelihood is defined as:

$$ \log \mathbf{P}_{\text{aug}}(T) = \log \mathbf{P}_{\text{prior}}(T) + \sigma \mathbf{S}(T) $$

where $\mathbf{S}(T) \in [0, 1]$ is the scalar score and $\sigma \geq 0$ controls the balance between reward and regularization. The DAP loss is:

$$ \mathcal{L}(T) = \left(\log \mathbf{P}_{\text{aug}}(T) - \log \mathbf{P}_{\text{agent}}(T)\right)^2 $$

The presence of the prior likelihood in the augmented likelihood constrains how far the agent can deviate from chemically plausible space, functioning similarly to proximal policy gradient methods. The loss is lower-bounded by:

$$ \mathcal{L}(T) \geq \max\left(0, \log \mathbf{P}_{\text{prior}}(T) + \sigma \mathbf{S}(T)\right)^2 $$

Four Molecule Generators

REINVENT 4 supports four generator types:

GeneratorArchitectureInputTask
ReinventRNNNoneDe novo design from scratch
LibInventRNNScaffold SMILESR-group replacement, library design
LinkInventRNNTwo warhead fragmentsLinker design, scaffold hopping
Mol2MolTransformerInput moleculeMolecular optimization within similarity bounds

All generators are fully integrated with all three optimization algorithms (TL, RL, CL). The Mol2Mol transformer was trained on over 200 billion molecular pairs from PubChem with Tanimoto similarity $\geq 0.50$, using ranking loss to directly link negative log-likelihood to molecular similarity.

Staged Learning (Curriculum Learning)

A key new feature is staged learning, which implements curriculum learning as multi-stage RL. Each stage can define a different scoring profile, allowing users to gradually phase in computationally expensive scoring functions. For example, cheap drug-likeness filters can run first, followed by docking in later stages. Stages terminate when a maximum score threshold is exceeded or a step limit is reached.

Scoring Subsystem

The scoring subsystem implements a plugin architecture supporting over 25 scoring components, including:

  • Physicochemical descriptors from RDKit (QED, SLogP, TPSA, molecular weight, etc.)
  • Molecular docking via DockStream (AutoDock Vina, rDock, Hybrid, Glide, GOLD)
  • QSAR models via Qptuna and ChemProp (D-MPNN)
  • Shape similarity via ROCS
  • Synthesizability estimation via SA score
  • Matched molecular pairs via mmpdb
  • Generic REST and external process interfaces

Scores are aggregated via weighted arithmetic or geometric mean. A transform system (sigmoid, step functions, value maps) normalizes individual component scores to $[0, 1]$.

PDK1 Inhibitor Case Study

The paper demonstrates REINVENT 4 through a structure-based drug design exercise targeting Phosphoinositide-dependent kinase-1 (PDK1) inhibitors. The experimental setup uses PDB crystal structure 2XCH with DockStream and Glide for docking, defining hits as molecules with docking score $\leq -8$ kcal/mol and QED $\geq 0.7$.

Baseline RL from prior: 50 epochs of staged learning with batch size 128 produced 119 hits from 6,400 generated molecules (1.9% hit rate), spread across 103 generic Bemis-Murcko scaffolds.

Transfer learning + RL: After 10 epochs of TL on 315 congeneric pyridinone PDK1 actives from PubChem Assay AID1798002, the same 50-epoch RL run produced 222 hits (3.5% hit rate) across 176 unique generic scaffolds, nearly doubling productivity.

Both approaches generated top-scoring molecules (docking score of -10.1 kcal/mol each) with plausible binding poses reproducing key protein-ligand interactions seen in the native crystal structure, including hinge interactions with ALA 162 and contacts with LYS 111.

The paper also demonstrates the agent’s plasticity through a molecular weight switching experiment: after 500 epochs driving generation toward 1500 Da molecules, switching the reward to favor molecules $\leq 500$ Da resulted in rapid adaptation within ~50 epochs, showing that the RL agent can recover from extreme biases.

Practical Software for AI-Driven Drug Discovery

REINVENT 4 represents a mature, well-documented framework that consolidates years of incremental development into a single codebase. Key practical features include TOML/JSON configuration, TensorBoard visualization, multinomial sampling and beam search decoding, diversity filters for scaffold-level novelty, experience replay (inception), and a plugin mechanism for extending the scoring subsystem.

The authors acknowledge that this is one approach among many and that there is no single solution that uniformly outperforms others. REINVENT has demonstrated strong sample efficiency in benchmarks and produced realistic 3D docking poses, but the paper does not claim universal superiority. The focus is on providing a well-engineered, transparent reference implementation rather than advancing a novel algorithm.

Limitations include that only the Mol2Mol prior supports stereochemistry, the training data biases constrain the explorable chemical space, and the SMILES-based representation inherits the known fragility of string-based molecular encodings.


Reproducibility Details

Data

PurposeDatasetSizeNotes
Prior training (Reinvent)ChEMBL 25~1.7M moleculesDrug-like compounds
Prior training (LibInvent)ChEMBL 27~1.9M moleculesScaffold-decoration pairs
Prior training (LinkInvent)ChEMBL 27~1.9M moleculesFragment-linker pairs
Prior training (Mol2Mol)ChEMBL 28 / PubChem~200B pairsTanimoto similarity $\geq 0.50$
Case study TLPubChem AID1798002315 compoundsCongeneric PDK1 actives
Case study dockingPDB 2XCH1 structurePDK1 crystal structure

Algorithms

  • Optimization: DAP (recommended), plus three deprecated alternatives (REINFORCE, A2C, MAULI)
  • Decoding: Multinomial sampling (default, temperature $K = 1$) and beam search
  • Diversity filter: Murcko scaffold, topological scaffold, scaffold similarity, same-SMILES penalty
  • Experience replay: Inception memory with configurable size and sampling rate
  • Gradient descent: Adam optimizer

Models

All pre-trained priors are distributed with the repository. RNN-based generators (Reinvent, LibInvent, LinkInvent) and transformer-based generator (Mol2Mol) with multiple similarity-conditioned variants.

Evaluation

MetricValueConditionNotes
Hit rate (RL)1.9%50 epochs, batch 128PDK1 case study
Hit rate (TL+RL)3.5%10 TL + 50 RL epochsPDK1 case study
Scaffold diversity (RL)103 scaffoldsFrom 119 hitsGeneric Bemis-Murcko
Scaffold diversity (TL+RL)176 scaffoldsFrom 222 hitsGeneric Bemis-Murcko
Best docking score-10.1 kcal/molBoth methodsGlide SP

Hardware

The paper does not specify hardware requirements. REINVENT 4 supports both GPU and CPU execution. Python 3.10+ is required, with PyTorch 1.x (2.0 also compatible) and RDKit 2022.9+.

Artifacts

ArtifactTypeLicenseNotes
REINVENT4CodeApache-2.0Full framework with pre-trained priors
DockStreamCodeApache-2.0Docking wrapper for scoring

Paper Information

Citation: Loeffler, H. H., He, J., Tibo, A., Janet, J. P., Voronov, A., Mervin, L. H., & Engkvist, O. (2024). Reinvent 4: Modern AI-driven generative molecule design. Journal of Cheminformatics, 16, 20. https://doi.org/10.1186/s13321-024-00812-5

@article{loeffler2024reinvent,
  title={Reinvent 4: Modern AI-driven generative molecule design},
  author={Loeffler, Hannes H. and He, Jiazhen and Tibo, Alessandro and Janet, Jon Paul and Voronov, Alexey and Mervin, Lewis H. and Engkvist, Ola},
  journal={Journal of Cheminformatics},
  volume={16},
  number={1},
  pages={20},
  year={2024},
  publisher={Springer},
  doi={10.1186/s13321-024-00812-5}
}