Paper Information

Citation: Tripp, A., & Hernández-Lobato, J. M. (2023). Genetic algorithms are strong baselines for molecule generation. arXiv preprint arXiv:2310.09267. https://arxiv.org/abs/2310.09267

Publication: arXiv preprint, 2023

Additional Resources:

A Position Paper on Molecular Generation Baselines

This is a Position paper that argues genetic algorithms (GAs) are underused and underappreciated as baselines in the molecular generation community. The primary contribution is empirical evidence that a simple GA implementation (MOL_GA) matches or outperforms many sophisticated deep learning methods on standard benchmarks. The authors propose the “GA criterion” as a minimum bar for evaluating new molecular generation algorithms.

Why Molecular Generation May Be Easier Than Assumed

Drug discovery is fundamentally a molecular generation task, and many machine learning methods have been proposed for it (Du et al., 2022). The problem has many variants, from unconditional generation of novel molecules to directed optimization of specific molecular properties.

The authors observe that generating valid molecules is, in some respects, straightforward. The rules governing molecular validity are well-defined bond constraints that can be checked using standard cheminformatics software like RDKit. This means new molecules can be generated simply by adding, removing, or substituting fragments of known molecules. When applied iteratively, this is exactly what a genetic algorithm does. Despite this, many papers in the field propose complex deep learning methods without adequately comparing to simple GA baselines.

The GA Criterion for Evaluating New Methods

The core proposal is the GA criterion: new methods in molecular generation should offer some clear advantage over genetic algorithms. This advantage can be:

  • Empirical: outperforming GAs on relevant benchmarks
  • Conceptual: identifying and overcoming a specific limitation of randomly modifying known molecules

The authors argue that the current state of molecular generation research reflects poor empirical practices, where comprehensive baseline evaluation is treated as optional rather than essential.

Genetic Algorithm Framework and Benchmark Experiments

How Genetic Algorithms Work for Molecules

GAs operate through the following iterative procedure:

  1. Start with an initial population $P$ of molecules
  2. Sample a subset $S \subseteq P$ from the population (possibly biased toward better molecules)
  3. Generate new molecules $N$ from $S$ via mutation and crossover operations
  4. Select a new population $P’$ from $P \cup N$ (e.g., keep the highest-scoring molecules)
  5. Set $P \leftarrow P’$ and repeat from step 2

The MOL_GA implementation uses:

  • Quantile-based sampling (step 2): molecules are sampled from the top quantiles of the population using a log-uniform distribution over quantile thresholds:

$$ u \sim \mathcal{U}[-3, 0], \quad \epsilon = 10^{u} $$

A molecule is drawn uniformly from the top $\epsilon$ fraction of the population.

  • Mutation and crossover (step 3): graph-based operations from Jensen (2019), as implemented in the GuacaMol benchmark (Brown et al., 2019)
  • Greedy population selection (step 4): molecules with the highest scores are retained

Unconditional Generation on ZINC 250K

The first experiment evaluates unconditional molecule generation, where the task is to produce novel, valid, and unique molecules distinct from a reference set (ZINC 250K). Success is measured by validity, novelty (at 10,000 generated molecules), and uniqueness.

MethodPaperValidityNovelty@10kUniqueness
JT-VAEJin et al. (2018)99.8%100%100%
GCPNYou et al. (2018)100%100%99.97%
MolecularRNNPopova et al. (2019)100%100%99.89%
Graph NVPMadhawa et al. (2019)100%100%94.80%
Graph AFShi et al. (2020)100%100%99.10%
MoFlowZang and Wang (2020)100%100%99.99%
GraphCNFLippe and Gavves (2020)96.35%99.98%99.98%
Graph DFLuo et al. (2021)100%100%99.16%
ModFlowVerma et al. (2022)98.1%100%99.3%
GraphEBMLiu et al. (2021)99.96%100%98.79%
AddCarbonRenz et al. (2019)100%99.94%99.86%
MOL_GA(this paper)99.76%99.94%98.60%

All methods perform near 100% on all metrics, demonstrating that unconditional molecule generation is not a particularly discriminative benchmark. The authors note that generation speed (molecules per second) is an important missing dimension from these comparisons, where simple methods like GAs have a clear advantage.

Molecule Optimization on the PMO Benchmark

The second experiment evaluates directed molecule optimization on the Practical Molecular Optimization (PMO) benchmark (Gao et al., 2022), which measures the ability to find molecules optimizing a scalar objective function $f: \mathcal{M} \mapsto \mathbb{R}$ with a budget of 10,000 evaluations.

A key insight is that previous GA implementations in PMO used large generation sizes ($\approx 100$), which limits the number of improvement iterations. The authors set the generation size to 5, allowing approximately 2,000 iterations of improvement within the same evaluation budget.

TaskREINVENTGraph GAMOL_GA
albuterol_similarity0.882 +/- 0.0060.838 +/- 0.0160.896 +/- 0.035
amlodipine_mpo0.635 +/- 0.0350.661 +/- 0.0200.688 +/- 0.039
celecoxib_rediscovery0.713 +/- 0.0670.630 +/- 0.0970.567 +/- 0.083
drd20.945 +/- 0.0070.964 +/- 0.0120.936 +/- 0.016
fexofenadine_mpo0.784 +/- 0.0060.760 +/- 0.0110.825 +/- 0.019
isomers_c9h10n2o2pf2cl0.642 +/- 0.0540.719 +/- 0.0470.865 +/- 0.012
sitagliptin_mpo0.021 +/- 0.0030.433 +/- 0.0750.582 +/- 0.040
zaleplon_mpo0.358 +/- 0.0620.346 +/- 0.0320.519 +/- 0.029
Sum (23 tasks)14.19613.75114.708
Rank231

MOL_GA achieves the highest aggregate score across all 23 PMO tasks, outperforming both the previous best GA (Graph GA) and the previous best overall method (REINVENT). The authors attribute this partly to the tuning of the baselines in PMO rather than MOL_GA being an especially strong method, since MOL_GA is essentially the same algorithm as Graph GA with different hyperparameters.

Implications for Molecular Generation Research

The key findings and arguments are:

  1. GAs match or outperform deep learning methods on standard molecular generation benchmarks, both for unconditional generation and directed optimization.

  2. Hyperparameter choices matter significantly: MOL_GA’s strong performance on PMO comes partly from using a smaller generation size (5 vs. ~100), which allows more iterations of refinement within the same evaluation budget.

  3. The GA criterion should be enforced in peer review: new molecular generation methods should demonstrate a clear advantage over GAs, whether empirical or conceptual.

  4. Deep learning methods may implicitly do what GAs do explicitly: many generative models are trained on datasets of known molecules, so the novel molecules they produce may simply be variants of their training data. The authors consider this an important direction for future investigation.

  5. Poor empirical practices are widespread: the paper argues that many experiments in molecule generation are conducted with an explicit desired outcome (that the novel algorithm is the best), leading to inadequate baseline comparisons.

The authors are careful to note that this result should not be interpreted as GAs being exceptional algorithms. Rather, it is an indication that more complex methods have made surprisingly little progress beyond what simple heuristic search can achieve.


Reproducibility Details

Data

PurposeDatasetSizeNotes
Unconditional generationZINC 250K250,000 moleculesReference set for novelty evaluation
Directed optimizationPMO benchmark23 tasks10,000 evaluation budget per task

Algorithms

  • GA implementation: MOL_GA package, using graph-based mutation and crossover from Jensen (2019) via the GuacaMol implementation
  • Generation size: 5 molecules per iteration (allowing ~2,000 iterations with 10,000 evaluations)
  • Population selection: Greedy (highest-scoring molecules retained)
  • Sampling: Quantile-based with log-uniform distribution over quantile thresholds

Evaluation

MetricBenchmarkNotes
Validity, Novelty@10k, UniquenessZINC 250K unconditionalCalculated using MOSES package
AUC top-10 scoresPMO benchmark23 optimization tasks with 10,000 evaluation budget

Hardware

The paper does not specify hardware requirements. Given that GAs are computationally lightweight compared to deep learning methods, standard CPU hardware is likely sufficient.

Artifacts

ArtifactTypeLicenseNotes
MOL_GACodeMITPython package for molecular genetic algorithms
MOL_GA on PyPICodeMITpip-installable package

Citation

@article{tripp2023genetic,
  title={Genetic algorithms are strong baselines for molecule generation},
  author={Tripp, Austin and Hern{\'a}ndez-Lobato, Jos{\'e} Miguel},
  journal={arXiv preprint arXiv:2310.09267},
  year={2023}
}