<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Search-Based Generation on Hunter Heidenreich | ML Research Scientist</title><link>https://hunterheidenreich.com/notes/chemistry/molecular-design/generation/search-based/</link><description>Recent content in Search-Based Generation on Hunter Heidenreich | ML Research Scientist</description><image><title>Hunter Heidenreich | ML Research Scientist</title><url>https://hunterheidenreich.com/img/avatar.webp</url><link>https://hunterheidenreich.com/img/avatar.webp</link></image><generator>Hugo -- 0.147.7</generator><language>en-US</language><copyright>2026 Hunter Heidenreich</copyright><lastBuildDate>Fri, 10 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://hunterheidenreich.com/notes/chemistry/molecular-design/generation/search-based/index.xml" rel="self" type="application/rss+xml"/><item><title>ChemGE: Molecule Generation via Grammatical Evolution</title><link>https://hunterheidenreich.com/notes/chemistry/molecular-design/generation/search-based/chemge-grammatical-evolution-molecule-generation/</link><pubDate>Sat, 28 Mar 2026 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/notes/chemistry/molecular-design/generation/search-based/chemge-grammatical-evolution-molecule-generation/</guid><description>ChemGE applies grammatical evolution to SMILES strings for population-based de novo molecule generation with inherent parallelism and diversity.</description><content:encoded><![CDATA[<h2 id="grammatical-evolution-for-de-novo-molecular-design">Grammatical Evolution for De Novo Molecular Design</h2>
<p>This is a <strong>Method</strong> paper that introduces ChemGE, a population-based molecular generation approach built on grammatical evolution. Rather than using deep neural networks, ChemGE evolves populations of <a href="/notes/chemistry/molecular-representations/notations/smiles/">SMILES</a> strings through a context-free grammar, enabling concurrent evaluation by multiple molecular simulators and producing diverse molecular libraries. The method represents an alternative paradigm for de novo drug design: evolutionary optimization over formal grammars rather than learned latent spaces or autoregressive neural models.</p>
<h2 id="limitations-of-sequential-deep-learning-generators">Limitations of Sequential Deep Learning Generators</h2>
<p>At the time of publication, the dominant approaches to de novo molecular generation included Bayesian optimization over VAE latent spaces (<a href="/notes/chemistry/molecular-design/generation/latent-space/automatic-chemical-design-vae/">CVAE</a>, <a href="/notes/chemistry/molecular-design/generation/latent-space/grammar-variational-autoencoder/">GVAE</a>), reinforcement learning with recurrent neural networks (<a href="/notes/chemistry/molecular-design/generation/rl-tuned/organ-objective-reinforced-gan/">ORGAN</a>, <a href="/notes/chemistry/molecular-design/generation/rl-tuned/reinvent-deep-rl-molecular-design/">REINVENT</a>), sequential Monte Carlo search, and Monte Carlo tree search (ChemTS). These methods share two practical limitations:</p>
<ol>
<li>
<p><strong>Simulation concurrency</strong>: Most methods generate one molecule at a time, making it difficult to run multiple molecular simulations (e.g., <a href="https://en.wikipedia.org/wiki/Molecular_docking">docking</a>) in parallel. This wastes computational resources in high-throughput virtual screening settings.</p>
</li>
<li>
<p><strong>Molecular diversity</strong>: Deep learning generators tend to exploit narrow regions of chemical space. Deep reinforcement learning methods in particular often generate very similar molecules, requiring special countermeasures to maintain diversity. Since drug discovery is a multi-stage pipeline, limited diversity reduces survival rates in downstream <a href="https://en.wikipedia.org/wiki/ADME">ADMET</a> screening.</p>
</li>
</ol>
<p>ChemGE addresses both problems by maintaining a large population of molecules that are evolved and evaluated concurrently.</p>
<h2 id="core-innovation-chromosome-to-smiles-mapping-via-grammar-rules">Core Innovation: Chromosome-to-SMILES Mapping via Grammar Rules</h2>
<p>ChemGE encodes each molecule as a chromosome: a sequence of $N$ integers that deterministically maps to a SMILES string through a context-free grammar. The mapping process works as follows:</p>
<ol>
<li>Start with the grammar&rsquo;s start symbol</li>
<li>At each step $k$, look up the $k$-th integer $c = C[k]$ from the chromosome</li>
<li>Identify the leftmost non-terminal symbol and count its $r$ applicable production rules</li>
<li>Apply the $((c \bmod r) + 1)$-th rule</li>
<li>Repeat until no non-terminal symbols remain or the chromosome is exhausted</li>
</ol>
<p>The context-free grammar is a subset of the OpenSMILES specification, defined formally as $G = (V, \Sigma, R, S)$ where $V$ is the set of non-terminal symbols, $\Sigma$ is the set of terminal symbols, $R$ is the set of production rules, and $S$ is the start symbol.</p>
<p>Evolution follows the $(\mu + \lambda)$ evolution strategy:</p>
<ol>
<li>Create $\lambda$ new chromosomes by drawing random chromosomes from the population and mutating one integer at a random position</li>
<li>Translate each chromosome to a SMILES string and evaluate fitness (e.g., docking score). Invalid molecules receive fitness $-\infty$</li>
<li>Select the top $\mu$ molecules from the merged pool of $\mu + \lambda$ candidates</li>
</ol>
<p>The authors did not use crossover, as it did not improve performance. Diversity is inherently maintained because a large fraction of molecules are mutated in each generation.</p>
<h2 id="experimental-setup-and-benchmark-comparisons">Experimental Setup and Benchmark Comparisons</h2>
<h3 id="druglikeness-score-benchmark">Druglikeness Score Benchmark</h3>
<p>The first experiment optimized the penalized logP score $J^{\log P}$, an indicator of druglikeness defined as:</p>
<p>$$
J^{\log P}(m) = \log P(m) - \text{SA}(m) - \text{ring-penalty}(m)
$$</p>
<p>where $\log P(m)$ is the <a href="https://en.wikipedia.org/wiki/Octanol-water_partition_coefficient">octanol-water partition coefficient</a>, $\text{SA}(m)$ is the synthetic accessibility score, and ring-penalty$(m)$ penalizes carbon rings larger than size 6. All terms are normalized to zero mean and unit standard deviation. Initial populations were randomly sampled from the ZINC database (35 million compounds), with fitness set to $-\infty$ for molecules with molecular weight above 500 or duplicate structures.</p>
<p>ChemGE was compared against CVAE, GVAE, and ChemTS across population sizes $(\mu, \lambda) \in {(10, 20), (100, 200), (1000, 2000), (10000, 20000)}$.</p>
<table>
  <thead>
      <tr>
          <th>Method</th>
          <th>2h</th>
          <th>4h</th>
          <th>6h</th>
          <th>8h</th>
          <th>Mol/Min</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>ChemGE (10, 20)</td>
          <td>4.46 +/- 0.34</td>
          <td>4.46 +/- 0.34</td>
          <td>4.46 +/- 0.34</td>
          <td>4.46 +/- 0.34</td>
          <td>14.5</td>
      </tr>
      <tr>
          <td>ChemGE (100, 200)</td>
          <td>5.17 +/- 0.26</td>
          <td>5.17 +/- 0.26</td>
          <td>5.17 +/- 0.26</td>
          <td>5.17 +/- 0.26</td>
          <td>135</td>
      </tr>
      <tr>
          <td>ChemGE (1000, 2000)</td>
          <td>4.45 +/- 0.24</td>
          <td>5.32 +/- 0.43</td>
          <td>5.73 +/- 0.33</td>
          <td>5.88 +/- 0.34</td>
          <td>527</td>
      </tr>
      <tr>
          <td>ChemGE (10000, 20000)</td>
          <td>4.20 +/- 0.33</td>
          <td>4.28 +/- 0.28</td>
          <td>4.40 +/- 0.27</td>
          <td>4.53 +/- 0.26</td>
          <td>555</td>
      </tr>
      <tr>
          <td>CVAE</td>
          <td>-30.18 +/- 26.91</td>
          <td>-1.39 +/- 2.24</td>
          <td>-0.61 +/- 1.08</td>
          <td>-0.006 +/- 0.92</td>
          <td>0.14</td>
      </tr>
      <tr>
          <td>GVAE</td>
          <td>-4.34 +/- 3.14</td>
          <td>-1.29 +/- 1.67</td>
          <td>-0.17 +/- 0.96</td>
          <td>0.25 +/- 1.31</td>
          <td>1.38</td>
      </tr>
      <tr>
          <td>ChemTS</td>
          <td>4.91 +/- 0.38</td>
          <td>5.41 +/- 0.51</td>
          <td>5.49 +/- 0.44</td>
          <td>5.58 +/- 0.50</td>
          <td>40.89</td>
      </tr>
  </tbody>
</table>
<p>At $(\mu, \lambda) = (1000, 2000)$, ChemGE achieved the highest final score of 5.88 and generated 527 unique molecules per minute, roughly 13x faster than ChemTS and 3700x faster than CVAE. The small population (10, 20) converged prematurely with insufficient diversity, while the overly large population (10000, 20000) could not run enough generations to optimize effectively.</p>
<h3 id="docking-experiment-with-thymidine-kinase">Docking Experiment with Thymidine Kinase</h3>
<p>The second experiment applied ChemGE to generate molecules with high predicted binding affinity for <a href="https://en.wikipedia.org/wiki/Thymidine_kinase">thymidine kinase</a> (KITH), a well-known antiviral drug target. The authors used rDock for docking simulation, taking the best intermolecular score $S_{\text{inter}}$ from three runs with different initial conformations. Fitness was defined as $-S_{\text{inter}}$ (lower scores indicate higher affinity). The protein structure was taken from PDB ID 2B8T.</p>
<p>With 32 parallel cores and $(\mu, \lambda) = (32, 64)$, ChemGE completed 1000 generations in approximately 26 hours, generating 9466 molecules total. Among these, 349 molecules achieved intermolecular scores better than the best known inhibitor in the DUD-E database.</p>
<h3 id="diversity-analysis">Diversity Analysis</h3>
<p>Molecular diversity was measured using internal diversity based on Morgan fingerprints:</p>
<p>$$
I(A) = \frac{1}{|A|^2} \sum_{(x,y) \in A \times A} T_d(x, y)
$$</p>
<p>where $T_d(x, y) = 1 - \frac{|x \cap y|}{|x \cup y|}$ is the <a href="https://en.wikipedia.org/wiki/Jaccard_index#Tanimoto_similarity_and_distance">Tanimoto distance</a>.</p>
<p>The 349 &ldquo;ChemGE-active&rdquo; molecules (those scoring better than the best known inhibitor) had an internal diversity of 0.55, compared to 0.46 for known inhibitors and 0.65 for the whole ZINC database. This is a substantial improvement over known actives, achieved without any explicit diversity-promoting mechanism.</p>
<p>ISOMAP visualizations showed that ChemGE populations migrated away from known inhibitors over generations, ultimately occupying a completely different region of chemical space by generation 1000. This suggests ChemGE discovered a novel structural class of potential binders.</p>
<h2 id="high-throughput-and-diversity-without-deep-learning">High Throughput and Diversity Without Deep Learning</h2>
<p>ChemGE demonstrates several notable findings:</p>
<ol>
<li>
<p><strong>Deep learning is not required</strong> for competitive de novo molecular generation. Grammatical evolution over SMILES achieves higher throughput and comparable or better optimization scores than VAE- and RNN-based methods.</p>
</li>
<li>
<p><strong>Population size matters significantly</strong>. Too small a population leads to premature convergence. Too large a population prevents sufficient per-molecule optimization within the computational budget. The $(\mu, \lambda) = (1000, 2000)$ setting provided the best balance.</p>
</li>
<li>
<p><strong>Inherent diversity</strong> is a key advantage of evolutionary methods. Without any explicit diversity loss or penalty, ChemGE maintains diversity comparable to the ZINC database and exceeds that of known active molecules.</p>
</li>
<li>
<p><strong>Parallel evaluation</strong> is naturally supported. Each generation produces $\lambda$ independent molecules that can be evaluated by separate docking simulators simultaneously.</p>
</li>
</ol>
<p>The authors acknowledge several limitations. Synthetic routes and ADMET properties were not evaluated for the generated molecules. The docking scores, while favorable, require confirmation through biological assays. The authors also note that incorporating probabilistic or neural models into the evolutionary process might further improve performance.</p>
<hr>
<h2 id="reproducibility-details">Reproducibility Details</h2>
<h3 id="data">Data</h3>
<table>
  <thead>
      <tr>
          <th>Purpose</th>
          <th>Dataset</th>
          <th>Size</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Initial population</td>
          <td>ZINC</td>
          <td>~35M compounds</td>
          <td>Randomly sampled starting molecules</td>
      </tr>
      <tr>
          <td>Docking target</td>
          <td>PDB 2B8T</td>
          <td>1 structure</td>
          <td>Thymidine kinase-ligand complex</td>
      </tr>
      <tr>
          <td>Baseline actives</td>
          <td>DUD-E (KITH)</td>
          <td>57 inhibitors</td>
          <td>Known thymidine kinase inhibitors</td>
      </tr>
  </tbody>
</table>
<h3 id="algorithms">Algorithms</h3>
<ul>
<li>Grammatical evolution with $(\mu + \lambda)$ evolution strategy</li>
<li>Mutation only (no crossover)</li>
<li>Context-free grammar subset of OpenSMILES specification</li>
<li>Chromosome length: $N$ integers per molecule</li>
<li>Fitness set to $-\infty$ for invalid SMILES, MW &gt; 500, or duplicate molecules</li>
</ul>
<h3 id="models">Models</h3>
<p>No neural network models are used. ChemGE is purely evolutionary.</p>
<h3 id="evaluation">Evaluation</h3>
<table>
  <thead>
      <tr>
          <th>Metric</th>
          <th>Value</th>
          <th>Baseline</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Max $J^{\log P}$ (8h)</td>
          <td>5.88 +/- 0.34</td>
          <td>ChemTS: 5.58 +/- 0.50</td>
          <td>ChemGE (1000, 2000)</td>
      </tr>
      <tr>
          <td>Molecules/min</td>
          <td>527</td>
          <td>ChemTS: 40.89</td>
          <td>~13x throughput improvement</td>
      </tr>
      <tr>
          <td>Docking hits</td>
          <td>349</td>
          <td>Best DUD-E inhibitor</td>
          <td>Molecules with better $S_{\text{inter}}$</td>
      </tr>
      <tr>
          <td>Internal diversity</td>
          <td>0.55</td>
          <td>Known inhibitors: 0.46</td>
          <td>Morgan fingerprint Tanimoto distance</td>
      </tr>
  </tbody>
</table>
<h3 id="hardware">Hardware</h3>
<ul>
<li>CPU: Intel Xeon E5-2630 v3 (benchmark experiments, single core)</li>
<li>Docking: 32 cores in parallel (thymidine kinase experiment, ~26 hours for 1000 generations)</li>
</ul>
<h3 id="artifacts">Artifacts</h3>
<table>
  <thead>
      <tr>
          <th>Artifact</th>
          <th>Type</th>
          <th>License</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="https://github.com/tsudalab/ChemGE">ChemGE</a></td>
          <td>Code</td>
          <td>MIT</td>
          <td>Official implementation</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="paper-information">Paper Information</h2>
<p><strong>Citation</strong>: Yoshikawa, N., Terayama, K., Sumita, M., Homma, T., Oono, K., &amp; Tsuda, K. (2018). Population-based de novo molecule generation, using grammatical evolution. <em>Chemistry Letters</em>, 47(11), 1431-1434. <a href="https://doi.org/10.1246/cl.180665">https://doi.org/10.1246/cl.180665</a></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@article</span>{yoshikawa2018chemge,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">title</span>=<span style="color:#e6db74">{Population-based De Novo Molecule Generation, Using Grammatical Evolution}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">author</span>=<span style="color:#e6db74">{Yoshikawa, Naruki and Terayama, Kei and Sumita, Masato and Homma, Teruki and Oono, Kenta and Tsuda, Koji}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">journal</span>=<span style="color:#e6db74">{Chemistry Letters}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">volume</span>=<span style="color:#e6db74">{47}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">number</span>=<span style="color:#e6db74">{11}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">pages</span>=<span style="color:#e6db74">{1431--1434}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">year</span>=<span style="color:#e6db74">{2018}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">publisher</span>=<span style="color:#e6db74">{Oxford University Press}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">doi</span>=<span style="color:#e6db74">{10.1246/cl.180665}</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>]]></content:encoded></item><item><title>STONED: Training-Free Molecular Design with SELFIES</title><link>https://hunterheidenreich.com/notes/chemistry/molecular-design/generation/search-based/stoned-selfies-chemical-space-exploration/</link><pubDate>Wed, 25 Mar 2026 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/notes/chemistry/molecular-design/generation/search-based/stoned-selfies-chemical-space-exploration/</guid><description>STONED uses string mutations in the SELFIES representation for training-free molecular generation, interpolation, and chemical space exploration.</description><content:encoded><![CDATA[<h2 id="a-training-free-algorithm-for-molecular-generation">A Training-Free Algorithm for Molecular Generation</h2>
<p>This is a <strong>Method</strong> paper that introduces STONED (Superfast Traversal, Optimization, Novelty, Exploration and Discovery), a suite of algorithms for molecular generation and chemical space exploration. STONED operates entirely through string manipulations on the <a href="/notes/chemistry/molecular-representations/notations/selfies/">SELFIES</a> molecular representation, avoiding the need for deep learning models, training data, or GPU resources. The key claim is that simple character-level mutations and interpolations in SELFIES can achieve results competitive with state-of-the-art deep generative models on standard benchmarks.</p>
<h2 id="why-deep-generative-models-may-be-overkill">Why Deep Generative Models May Be Overkill</h2>
<p>Deep generative models (VAEs, GANs, RNNs, reinforcement learning) have become popular for <a href="/notes/chemistry/molecular-design/generation/evaluation/inverse-molecular-design-ml-review/">inverse molecular design</a>, but they come with practical costs: large training datasets, expensive GPU compute, and long training times. Fragile representations like <a href="/notes/chemistry/molecular-representations/notations/smiles/">SMILES</a> compound the problem, since large portions of a latent space can map to invalid molecules. Even with the introduction of SELFIES (a 100% valid string representation), prior work still embedded it within neural network architectures.</p>
<p>The authors argue that for tasks like local chemical space exploration and molecular interpolation, the guarantees of SELFIES alone may be sufficient. Because every SELFIES string maps to a valid molecule, random character mutations always produce valid structures. This observation eliminates the need for learned generation procedures entirely.</p>
<h2 id="core-innovation-selfies-string-mutations-as-molecular-operators">Core Innovation: SELFIES String Mutations as Molecular Operators</h2>
<p>STONED relies on four key techniques built on SELFIES string manipulations:</p>
<p><strong>1. Random character mutations.</strong> A point mutation in SELFIES (character replacement, deletion, or addition) always yields a valid molecule. The position of mutations serves as a hyperparameter controlling exploration vs. exploitation: terminal character mutations preserve more structural similarity to the seed, while random mutations explore more broadly.</p>
<p><strong>2. Multiple SMILES orderings.</strong> A single molecule has many valid SMILES strings, each mapping to a different SELFIES. By generating 50,000 SMILES orderings and converting to SELFIES before mutation, the diversity of generated structures increases substantially.</p>
<p><strong>3. Deterministic interpolation.</strong> Given two SELFIES strings (padded to equal length), characters at equivalent positions can be successively replaced from the start molecule to the target molecule. Every intermediate string is a valid molecule. A chemical path is extracted by keeping only those intermediates that increase fingerprint similarity to the target.</p>
<p><strong>4. Fingerprint-based filtering.</strong> Since edit distance in SELFIES does not reflect molecular similarity, STONED uses fingerprint comparisons (ECFP4, FCFP4, atom-pair) to enforce structural similarity constraints.</p>
<p>The authors also propose a revised joint molecular similarity metric for evaluating median molecules. Given $n$ reference molecules $M = {m_1, m_2, \ldots, m_n}$, the joint similarity of a candidate molecule $m$ is:</p>
<p>$$
F(m) = \frac{1}{n} \sum_{i=1}^{n} \text{sim}(m_i, m) - \left[\max_{i} \text{sim}(m_i, m) - \min_{i} \text{sim}(m_i, m)\right]
$$</p>
<p>This penalizes candidates that are similar to only a subset of references, unlike the geometric mean metric used in GuacaMol which can yield high scores even with lopsided similarities.</p>
<h2 id="experimental-setup-and-applications">Experimental Setup and Applications</h2>
<h3 id="local-chemical-subspace-formation">Local chemical subspace formation</h3>
<p>Starting from a single seed molecule (<a href="https://en.wikipedia.org/wiki/Aripiprazole">aripiprazole</a>, albuterol, mestranol, or <a href="https://en.wikipedia.org/wiki/Celecoxib">celecoxib</a>), the algorithm generates 50,000 SMILES orderings and performs 1-5 point mutations per ordering, producing 250,000 candidate strings. Unique valid molecules are filtered by fingerprint similarity thresholds.</p>
<table>
  <thead>
      <tr>
          <th>Starting structure</th>
          <th>Fingerprint</th>
          <th>Molecules at $\delta &gt; 0.75$</th>
          <th>Molecules at $\delta &gt; 0.60$</th>
          <th>Molecules at $\delta &gt; 0.40$</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Aripiprazole (SELFIES, random)</td>
          <td>ECFP4</td>
          <td>513 (0.25%)</td>
          <td>4,206 (2.15%)</td>
          <td>34,416 (17.66%)</td>
      </tr>
      <tr>
          <td>Albuterol (SELFIES, random)</td>
          <td>FCFP4</td>
          <td>587 (0.32%)</td>
          <td>4,156 (2.33%)</td>
          <td>16,977 (9.35%)</td>
      </tr>
      <tr>
          <td>Mestranol (SELFIES, random)</td>
          <td>AP</td>
          <td>478 (0.22%)</td>
          <td>4,079 (1.90%)</td>
          <td>45,594 (21.66%)</td>
      </tr>
      <tr>
          <td>Celecoxib (SELFIES, random)</td>
          <td>ECFP4</td>
          <td>198 (0.10%)</td>
          <td>1,925 (1.00%)</td>
          <td>18,045 (9.44%)</td>
      </tr>
      <tr>
          <td>Celecoxib (SELFIES, terminal 10%)</td>
          <td>ECFP4</td>
          <td>864 (2.02%)</td>
          <td>9,407 (21.99%)</td>
          <td>34,187 (79.91%)</td>
      </tr>
  </tbody>
</table>
<p>Key finding: restricting mutations to terminal characters yields a 20x increase in high-similarity molecules compared to random positions. Compared to SMILES mutations (0.30% valid) and <a href="/notes/chemistry/molecular-representations/notations/deepsmiles-adaptation-for-ml/">DeepSMILES</a> (1.44% valid), SELFIES mutations are all valid by construction.</p>
<p>A two-step expansion (mutating all unique first-round neighbors) produced over 17 million unique molecules, with 120,000 having similarity greater than 0.4 to celecoxib.</p>
<h3 id="chemical-path-formation-and-drug-design">Chemical path formation and drug design</h3>
<p>Deterministic SELFIES interpolation between <a href="https://en.wikipedia.org/wiki/Tadalafil">tadalafil</a> and <a href="https://en.wikipedia.org/wiki/Sildenafil">sildenafil</a> generated paths where <a href="https://en.wikipedia.org/wiki/Partition_coefficient">logP</a> and QED values varied smoothly. A more challenging application docked intermediates between <a href="https://en.wikipedia.org/wiki/Dihydroergotamine">dihydroergotamine</a> (<a href="https://en.wikipedia.org/wiki/5-HT1B_receptor">5-HT1B</a> binder) and prinomastat (<a href="https://en.wikipedia.org/wiki/CYP2D6">CYP2D6</a> binder), finding molecules with non-trivial binding affinity to both proteins without any optimization routine.</p>
<h3 id="median-molecules-for-photovoltaics">Median molecules for photovoltaics</h3>
<p>Using 100 triplets from the Harvard Clean Energy (HCE) dataset, each with one molecule optimized for high LUMO energy, one for high dipole moment, and one for high <a href="https://en.wikipedia.org/wiki/HOMO_and_LUMO">HOMO-LUMO gap</a>, generalized chemical paths produced median molecules. These were evaluated with GFN2-xTB semiempirical calculations. The generated medians matched or exceeded the best molecules available in the HCE database in both structural similarity and target properties.</p>
<h3 id="guacamol-benchmarks">GuacaMol benchmarks</h3>
<p>Without any training, STONED achieved an overall <a href="/notes/chemistry/molecular-design/generation/evaluation/guacamol-benchmarking-de-novo-molecular-design/">GuacaMol</a> score of 14.70, competitive with several deep generative models. The approach simply identifies the single best molecule in the benchmark&rsquo;s training set and generates its local chemical subspace. 38% of the top-100 molecules from each benchmark passed compound quality filters, comparable to <a href="/notes/chemistry/molecular-design/generation/search-based/graph-based-genetic-algorithm-chemical-space/">Graph GA</a> and SMILES GA.</p>
<h2 id="results-summary-and-limitations">Results Summary and Limitations</h2>
<p>STONED demonstrates that SELFIES string mutations can match or approach deep generative models on standard molecular design benchmarks while being orders of magnitude faster and requiring no training. The most expensive benchmark (aripiprazole subspace) completed in 500 seconds on a laptop CPU.</p>
<p>The method comparison table from the paper highlights STONED&rsquo;s unique position:</p>
<table>
  <thead>
      <tr>
          <th>Feature</th>
          <th>Expert Systems</th>
          <th>VAE</th>
          <th>GAN</th>
          <th>RL</th>
          <th>STONED</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Expert rule-free</td>
          <td>No</td>
          <td>Yes</td>
          <td>Yes</td>
          <td>Yes</td>
          <td>Yes</td>
      </tr>
      <tr>
          <td>Structure coverage</td>
          <td>Partial</td>
          <td>Partial</td>
          <td>Partial</td>
          <td>Partial</td>
          <td>Yes</td>
      </tr>
      <tr>
          <td>Interpolatability</td>
          <td>No</td>
          <td>Yes</td>
          <td>Yes</td>
          <td>No</td>
          <td>Yes</td>
      </tr>
      <tr>
          <td>Property-based navigation</td>
          <td>Partial</td>
          <td>Yes</td>
          <td>Yes</td>
          <td>Yes</td>
          <td>Partial</td>
      </tr>
      <tr>
          <td>Training-free</td>
          <td>Yes</td>
          <td>No</td>
          <td>No</td>
          <td>No</td>
          <td>Yes</td>
      </tr>
      <tr>
          <td>Data independence</td>
          <td>Yes</td>
          <td>No</td>
          <td>No</td>
          <td>No</td>
          <td>Yes</td>
      </tr>
  </tbody>
</table>
<p><strong>Limitations acknowledged by the authors:</strong></p>
<ul>
<li>STONED lacks property-based navigation (gradient-guided optimization toward specific property targets). It can only do stochastic property optimization when wrapped in a genetic algorithm.</li>
<li>The success rate of mutations leading to structurally similar molecules is relatively low (0.1-2% at high similarity thresholds), though speed compensates.</li>
<li>Chemical paths can contain molecules with unstable functional groups or <a href="https://en.wikipedia.org/wiki/Tautomer">tautomerization</a> issues, requiring post-hoc filtering with domain-specific rules.</li>
<li>Fingerprint similarity does not capture all aspects of chemical similarity (3D geometry, reactivity, synthesizability).</li>
<li>The penalized logP and QED benchmarks used by GuacaMol do not represent the full complexity of practical molecular design.</li>
</ul>
<hr>
<h2 id="reproducibility-details">Reproducibility Details</h2>
<h3 id="data">Data</h3>
<table>
  <thead>
      <tr>
          <th>Purpose</th>
          <th>Dataset</th>
          <th>Size</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Photovoltaics</td>
          <td>Harvard Clean Energy (HCE) database</td>
          <td>~2.3M molecules</td>
          <td>Used for median molecule triplet experiments</td>
      </tr>
      <tr>
          <td>Benchmarking</td>
          <td>GuacaMol benchmark suite</td>
          <td>Varies per task</td>
          <td>Standard benchmarks for generative molecular design</td>
      </tr>
      <tr>
          <td>Comparison</td>
          <td>ChEMBL (SCScore &lt;= 2.5 subset)</td>
          <td>Fragment database</td>
          <td>Used for CReM comparison experiments</td>
      </tr>
  </tbody>
</table>
<h3 id="algorithms">Algorithms</h3>
<ul>
<li><strong>Local subspace formation</strong>: 50,000 SMILES orderings per seed molecule, 1-5 SELFIES point mutations each, totaling 250,000 candidates per experiment.</li>
<li><strong>Chemical paths</strong>: Deterministic character-by-character interpolation between padded SELFIES strings, with monotonic fingerprint similarity filtering.</li>
<li><strong>Median molecules</strong>: Generalized paths between 3+ reference molecules using 10,000 paths per triplet with randomized SMILES orderings.</li>
<li><strong>Docking</strong>: <a href="/notes/chemistry/molecular-design/generation/evaluation/smina-docking-benchmark/">SMINA</a> with crystal structures from PDB (4IAQ for 5-HT1B, 3QM4 for CYP2D6). Top-5 binding poses averaged.</li>
<li><strong>Quantum chemistry</strong>: GFN2-xTB for dipole moments, LUMO energies, and HOMO-LUMO gaps.</li>
</ul>
<h3 id="evaluation">Evaluation</h3>
<table>
  <thead>
      <tr>
          <th>Metric</th>
          <th>Value</th>
          <th>Baseline</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>GuacaMol overall score</td>
          <td>14.70</td>
          <td>Varies by model</td>
          <td>Competitive with deep generative models</td>
      </tr>
      <tr>
          <td>Quality filter pass rate</td>
          <td>38%</td>
          <td>Graph GA/SMILES GA comparable</td>
          <td>Top-100 molecules per benchmark</td>
      </tr>
      <tr>
          <td>Celecoxib neighbors ($\delta &gt; 0.75$)</td>
          <td>198-864</td>
          <td>CReM: 239</td>
          <td>Depends on mutation position strategy</td>
      </tr>
  </tbody>
</table>
<h3 id="hardware">Hardware</h3>
<p>All experiments run on a laptop with Intel i7-8750H CPU at 2.20 GHz. No GPU required. Most expensive single experiment (aripiprazole subspace) completed in 500 seconds.</p>
<h3 id="artifacts">Artifacts</h3>
<table>
  <thead>
      <tr>
          <th>Artifact</th>
          <th>Type</th>
          <th>License</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="https://github.com/aspuru-guzik-group/stoned-selfies">stoned-selfies</a></td>
          <td>Code</td>
          <td>Not specified</td>
          <td>Official implementation of STONED algorithms</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="paper-information">Paper Information</h2>
<p><strong>Citation</strong>: Nigam, A. K., Pollice, R., Krenn, M., dos Passos Gomes, G., &amp; Aspuru-Guzik, A. (2021). Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (STONED) algorithm for molecules using SELFIES. <em>Chemical Science</em>, 12(20), 7079-7090. <a href="https://doi.org/10.1039/d1sc00231g">https://doi.org/10.1039/d1sc00231g</a></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@article</span>{nigam2021stoned,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">title</span>=<span style="color:#e6db74">{Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery ({STONED}) algorithm for molecules using {SELFIES}}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">author</span>=<span style="color:#e6db74">{Nigam, AkshatKumar and Pollice, Robert and Krenn, Mario and dos Passos Gomes, Gabriel and Aspuru-Guzik, Al{\&#39;a}n}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">journal</span>=<span style="color:#e6db74">{Chemical Science}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">volume</span>=<span style="color:#e6db74">{12}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">number</span>=<span style="color:#e6db74">{20}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">pages</span>=<span style="color:#e6db74">{7079--7090}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">year</span>=<span style="color:#e6db74">{2021}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">publisher</span>=<span style="color:#e6db74">{Royal Society of Chemistry}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">doi</span>=<span style="color:#e6db74">{10.1039/d1sc00231g}</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>]]></content:encoded></item><item><title>Graph-Based GA and MCTS Generative Model for Molecules</title><link>https://hunterheidenreich.com/notes/chemistry/molecular-design/generation/search-based/graph-based-genetic-algorithm-chemical-space/</link><pubDate>Wed, 25 Mar 2026 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/notes/chemistry/molecular-design/generation/search-based/graph-based-genetic-algorithm-chemical-space/</guid><description>Jensen introduces a graph-based genetic algorithm and generative model with MCTS that outperforms ML methods for penalized logP optimization.</description><content:encoded><![CDATA[<h2 id="a-graph-based-approach-to-molecular-optimization">A Graph-Based Approach to Molecular Optimization</h2>
<p>This is a <strong>Method</strong> paper that introduces two graph-based approaches for exploring chemical space: a genetic algorithm (GB-GA) and a generative model combined with <a href="https://en.wikipedia.org/wiki/Monte_Carlo_tree_search">Monte Carlo tree search</a> (GB-GM-MCTS). The primary contribution is demonstrating that these non-ML, graph-based methods can match or exceed the performance of contemporary ML-based generative models for molecular property optimization, while being several orders of magnitude faster. The paper provides open-source implementations built on the RDKit cheminformatics package. The two approaches explore <a href="https://en.wikipedia.org/wiki/Chemical_space">chemical space</a> using direct graph manipulations rather than string-based representations like <a href="/notes/chemistry/molecular-representations/notations/smiles/">SMILES</a>.</p>
<h2 id="why-compare-simple-baselines-to-ml-generative-models">Why Compare Simple Baselines to ML Generative Models?</h2>
<p>By 2018, several ML-based generative models for molecules had been published, including VAEs, RNNs, and graph convolutional policy networks. However, these models were rarely compared against traditional optimization approaches such as genetic algorithms. Jensen identifies this gap explicitly: while ML generative model performance had been impressive, the lack of comparison to simpler baselines made it difficult to assess whether the complexity of ML approaches was justified.</p>
<p>A practical barrier to such comparisons was the absence of free, open-source GA implementations for molecular optimization (the existing ACSESS algorithm required proprietary OpenEye toolkits). This paper fills that gap by providing RDKit-based implementations of both the GB-GA and GB-GM-MCTS.</p>
<h2 id="graph-based-crossovers-mutations-and-monte-carlo-tree-search">Graph-Based Crossovers, Mutations, and Monte Carlo Tree Search</h2>
<h3 id="gb-ga-crossovers-and-mutations-on-molecular-graphs">GB-GA: Crossovers and Mutations on Molecular Graphs</h3>
<p>The GB-GA operates directly on molecular graph representations (not string representations like SMILES). It combines ideas from Brown et al. (2004) and the ACSESS algorithm of Virshup et al. (2013).</p>
<p><strong>Crossovers</strong> can occur at two types of positions with equal probability:</p>
<ul>
<li>Non-ring bonds: a molecule is cut at a non-ring bond, and fragments from two parent molecules are recombined</li>
<li>Ring bonds: adjacent bonds or bonds separated by one bond are cut, and fragments are mated using single or double bonds</li>
</ul>
<p><strong>Mutations</strong> include seven operation types, each with specified probabilities:</p>
<ul>
<li>Append atom (15%): adds an atom with a single, double, or triple bond</li>
<li>Insert atom (15%): inserts an atom into an existing bond</li>
<li>Delete atom (14%): removes an atom, reconnecting neighbors</li>
<li>Change atom type (14%): swaps element identity (C, N, O, F, S, Cl, Br)</li>
<li>Change bond order (14%): toggles between single, double, and triple bonds</li>
<li>Delete ring bond (14%): opens a ring</li>
<li>Add ring bond (14%): closes a new ring</li>
</ul>
<p>Molecules with macrocycles (seven or more atoms), allene centers in rings, fewer than five heavy atoms, incorrect valences, or more non-H atoms than the target size are discarded. The target size is sampled from a normal distribution with mean 39.15 and standard deviation 3.50 non-H atoms, calibrated to match the molecules found by Yang et al. (2017).</p>
<h3 id="gb-gm-mcts-a-probabilistic-growth-model-with-tree-search">GB-GM-MCTS: A Probabilistic Growth Model with Tree Search</h3>
<p>The GB-GM grows molecules one atom at a time, with the choice of bond order and atom type determined probabilistically from a bonding analysis of a reference dataset (the first 1000 molecules from ZINC). Since 63% of atoms in the reference set are ring atoms, ring-creation or ring-insertion mutations are chosen 63% of the time.</p>
<p>The generative model is combined with a <a href="https://en.wikipedia.org/wiki/Monte_Carlo_tree_search">Monte Carlo tree search</a> where:</p>
<ul>
<li>Each node corresponds to an atom addition step</li>
<li>Leaf parallelization uses a maximum of 25 leaf nodes</li>
<li>The exploration factor is $1 / \sqrt{2}$</li>
<li>Rollout terminates if the molecule exceeds the target size</li>
<li>The reward function returns 1 if the predicted $J(\mathbf{m})$ value exceeds the largest value found so far, and 0 otherwise</li>
</ul>
<h3 id="the-penalized-logp-objective">The Penalized logP Objective</h3>
<p>Both methods optimize the penalized logP score $J(\mathbf{m})$:</p>
<p>$$
J(\mathbf{m}) = \log P(\mathbf{m}) - \text{SA}(\mathbf{m}) - \text{RingPenalty}(\mathbf{m})
$$</p>
<p>where $\log P(\mathbf{m})$ is the <a href="https://en.wikipedia.org/wiki/Partition_coefficient">octanol-water partition coefficient</a> predicted by RDKit, $\text{SA}(\mathbf{m})$ is a synthetic accessibility score, and $\text{RingPenalty}(\mathbf{m})$ penalizes unrealistically large rings by reducing the score by $\text{RingSize} - 6$ for each oversized ring. Each property is normalized to zero mean and unit standard deviation across the ZINC dataset.</p>
<h2 id="experimental-setup-and-comparisons-to-ml-methods">Experimental Setup and Comparisons to ML Methods</h2>
<h3 id="gb-ga-experiments">GB-GA Experiments</h3>
<p>Ten GA simulations were performed with a population size of 20 over 50 generations (1000 $J(\mathbf{m})$ evaluations per run). The initial mating pool was 20 random molecules from the first 1000 molecules in ZINC. Two mutation rates were tested: 50% and 1%.</p>
<h3 id="gb-gm-mcts-experiments">GB-GM-MCTS Experiments</h3>
<p>Ten simulations used ethane as a seed molecule with 1000 tree traversals per run. Additional experiments used 5000 traversals and an adjusted probability of generating $\text{C}=\text{C}-\text{C}$ ring patterns (increased from 62% to 80%).</p>
<h3 id="baselines">Baselines</h3>
<p>Results were compared to those compiled by Yang et al. (2017):</p>
<ul>
<li>ChemTS (RNN + MCTS)</li>
<li>RNN with and without Bayesian optimization</li>
<li><a href="/notes/chemistry/molecular-design/generation/latent-space/automatic-chemical-design-vae/">Continuous VAE (CVAE)</a></li>
<li><a href="/notes/chemistry/molecular-design/generation/latent-space/grammar-variational-autoencoder/">Grammar VAE (GVAE)</a></li>
<li>Graph convolutional policy network (GCPN, from You et al. 2018)</li>
</ul>
<h3 id="key-results">Key Results</h3>
<table>
  <thead>
      <tr>
          <th>Method</th>
          <th>Average $J(\mathbf{m})$</th>
          <th>Molecules Evaluated</th>
          <th>CPU Time</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>GB-GA (50% mutation)</td>
          <td>6.8 +/- 0.7</td>
          <td>1000</td>
          <td>30 seconds</td>
      </tr>
      <tr>
          <td>GB-GA (1% mutation)</td>
          <td>7.4 +/- 0.9</td>
          <td>1000</td>
          <td>30 seconds</td>
      </tr>
      <tr>
          <td>GB-GM-MCTS (62%)</td>
          <td>2.6 +/- 0.6</td>
          <td>1000</td>
          <td>90 seconds</td>
      </tr>
      <tr>
          <td>GB-GM-MCTS (80%)</td>
          <td>3.4 +/- 0.6</td>
          <td>1000</td>
          <td>90 seconds</td>
      </tr>
      <tr>
          <td>GB-GM-MCTS (80%)</td>
          <td>4.3 +/- 0.6</td>
          <td>5000</td>
          <td>9 minutes</td>
      </tr>
      <tr>
          <td>ChemTS</td>
          <td>4.9 +/- 0.5</td>
          <td>~5000</td>
          <td>2 hours</td>
      </tr>
      <tr>
          <td>ChemTS</td>
          <td>5.6 +/- 0.5</td>
          <td>~20000</td>
          <td>8 hours</td>
      </tr>
      <tr>
          <td>RNN + BO</td>
          <td>4.5 +/- 0.2</td>
          <td>~4000</td>
          <td>8 hours</td>
      </tr>
      <tr>
          <td>Only RNN</td>
          <td>4.8 +/- 0.2</td>
          <td>~20000</td>
          <td>8 hours</td>
      </tr>
      <tr>
          <td>CVAE + BO</td>
          <td>0.0 +/- 0.9</td>
          <td>~100</td>
          <td>8 hours</td>
      </tr>
      <tr>
          <td>GVAE + BO</td>
          <td>0.2 +/- 1.3</td>
          <td>~1000</td>
          <td>8 hours</td>
      </tr>
  </tbody>
</table>
<p>The GB-GA with 1% mutation rate achieved an average maximum $J(\mathbf{m})$ of 7.4, which is 1.8 units higher than the best ML result (ChemTS at 5.6) while using 20x fewer evaluations and completing in 30 seconds versus 8 hours. The two highest-scoring individual molecules found by GB-GA had $J(\mathbf{m})$ scores of 8.8 and 8.5, exceeding the 7.8-8.0 range found by the GCPN approach. These molecules bore little resemblance to the initial mating pool (<a href="https://en.wikipedia.org/wiki/Jaccard_index">Tanimoto similarities</a> of 0.27 and 0.12 to the most similar ZINC molecules), indicating that the GA traversed a large distance in chemical space in just 50 generations.</p>
<p>The GB-GM-MCTS performed below ChemTS at equal evaluations (4.3 vs. 4.9 at 5000 evaluations) but was several orders of magnitude faster (9 minutes vs. 2 hours). The MCTS approach tended to extract the dominant hydrophobic structural motif (benzene rings) from the training set, making it more dependent on training set composition than the GA.</p>
<h2 id="simple-methods-set-a-high-bar-for-molecular-optimization">Simple Methods Set a High Bar for Molecular Optimization</h2>
<p>The central finding is that a simple graph-based genetic algorithm outperforms all tested ML-based generative models on penalized logP optimization, both in terms of solution quality and computational efficiency. The GB-GA achieves higher $J(\mathbf{m})$ scores with 1000 evaluations in 30 seconds than ML methods achieve with 20,000 evaluations over 8 hours.</p>
<p>Several additional observations emerge:</p>
<ol>
<li><strong>Chemical space traversal</strong>: The GB-GA can reach high-scoring molecules that are structurally distant from the starting population, with Tanimoto similarity as low as 0.12 to the nearest ZINC molecule.</li>
<li><strong>Mutation rate matters</strong>: A 1% mutation rate outperformed a 50% rate (7.4 vs. 6.8), suggesting that preserving more parental structure during crossover is beneficial for this objective.</li>
<li><strong>Training set dependence</strong>: The GB-GM-MCTS is more sensitive to training set composition than the GA. Its preference for benzene-ring-containing molecules (the dominant ZINC motif) limits its ability to discover alternative structural solutions like the long aliphatic chains favored by the GA.</li>
<li><strong>Generalizability caveat</strong>: Jensen explicitly notes that these comparisons cover only one property (penalized logP) and that similar comparisons for other properties are needed before drawing general conclusions.</li>
</ol>
<p>The paper&rsquo;s influence has been substantial: it helped establish the expectation that new molecular generative models should be benchmarked against genetic algorithm baselines, a position subsequently reinforced by Brown et al. (2019) in <a href="/notes/chemistry/molecular-design/generation/evaluation/guacamol-benchmarking-de-novo-molecular-design/">GuacaMol</a> and by <a href="/notes/chemistry/molecular-design/generation/search-based/genetic-algorithms-molecule-generation-baselines/">Tripp and Hernandez-Lobato (2023)</a>.</p>
<hr>
<h2 id="reproducibility-details">Reproducibility Details</h2>
<h3 id="data">Data</h3>
<table>
  <thead>
      <tr>
          <th>Purpose</th>
          <th>Dataset</th>
          <th>Size</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Initial mating pool / reference set</td>
          <td><a href="/notes/chemistry/datasets/zinc-22/">ZINC</a> (subset)</td>
          <td>First 1000 molecules</td>
          <td>Same subset used in previous studies (Gomez-Bombarelli et al., Yang et al.)</td>
      </tr>
      <tr>
          <td>Target molecule size</td>
          <td>Derived from Yang et al. results</td>
          <td>20 molecules</td>
          <td>Mean 39.15, SD 3.50 non-H atoms</td>
      </tr>
  </tbody>
</table>
<h3 id="algorithms">Algorithms</h3>
<ul>
<li><strong>GB-GA</strong>: Population size 20, 50 generations, mutation rates of 1% and 50% tested. Crossovers at ring and non-ring bonds with equal probability. Seven mutation types with specified probabilities. Molecules selected from mating pool based on normalized logP scores.</li>
<li><strong>GB-GM</strong>: Atom-by-atom growth using probabilistic rules derived from ZINC bonding analysis. Ring creation probability 63% (matching ZINC), with 80% variant also tested. Seed molecule: ethane.</li>
<li><strong>MCTS</strong>: Modified from haroldsultan/MCTS Python implementation. Leaf parallelization with max 25 leaf nodes. Exploration factor $1/\sqrt{2}$. Binary reward function (1 if new best, 0 otherwise).</li>
<li><strong>Property calculation</strong>: logP, SA score, and ring penalty all computed via RDKit. Each property normalized to zero mean and unit standard deviation across ZINC.</li>
</ul>
<h3 id="models">Models</h3>
<p>No neural network models are used. The GB-GA and GB-GM are purely algorithmic approaches parameterized by bonding statistics from the ZINC dataset.</p>
<h3 id="evaluation">Evaluation</h3>
<table>
  <thead>
      <tr>
          <th>Metric</th>
          <th>GB-GA (1%)</th>
          <th>Best ML (ChemTS)</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Average max $J(\mathbf{m})$</td>
          <td>7.4 +/- 0.9</td>
          <td>5.6 +/- 0.5</td>
          <td>Over 10 runs</td>
      </tr>
      <tr>
          <td>Single best $J(\mathbf{m})$</td>
          <td>8.8</td>
          <td>~8.0 (GCPN)</td>
          <td>GB-GA vs. You et al.</td>
      </tr>
      <tr>
          <td>Evaluations per run</td>
          <td>1000</td>
          <td>~20,000</td>
          <td>20x fewer for GB-GA</td>
      </tr>
      <tr>
          <td>CPU time per run</td>
          <td>30 seconds</td>
          <td>8 hours</td>
          <td>~960x faster</td>
      </tr>
  </tbody>
</table>
<h3 id="hardware">Hardware</h3>
<p>All GB-GA and GB-GM experiments were run on a laptop. No GPU required. The GB-GA completes in 30 seconds per run and the GB-GM-MCTS in 90 seconds (1000 traversals) to 9 minutes (5000 traversals).</p>
<h3 id="artifacts">Artifacts</h3>
<table>
  <thead>
      <tr>
          <th>Artifact</th>
          <th>Type</th>
          <th>License</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="https://github.com/jensengroup/GB-GA/tree/v0.0">GB-GA (v0.0)</a></td>
          <td>Code</td>
          <td>Not specified</td>
          <td>Graph-based genetic algorithm, RDKit dependency only</td>
      </tr>
      <tr>
          <td><a href="https://github.com/jensengroup/GB-GM/tree/v0.0">GB-GM (v0.0)</a></td>
          <td>Code</td>
          <td>Not specified</td>
          <td>Graph-based generative model + MCTS, RDKit dependency only</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="paper-information">Paper Information</h2>
<p><strong>Citation</strong>: Jensen, J. H. (2019). A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. <em>Chemical Science</em>, 10(12), 3567-3572. <a href="https://doi.org/10.1039/c8sc05372c">https://doi.org/10.1039/c8sc05372c</a></p>
<p><strong>Publication</strong>: Chemical Science (Royal Society of Chemistry), 2019</p>
<p><strong>Additional Resources</strong>:</p>
<ul>
<li><a href="https://github.com/jensengroup/GB-GA">GB-GA Code (GitHub)</a></li>
<li><a href="https://github.com/jensengroup/GB-GM">GB-GM Code (GitHub)</a></li>
</ul>
<h2 id="citation">Citation</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@article</span>{jensen2019graph,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">title</span>=<span style="color:#e6db74">{A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">author</span>=<span style="color:#e6db74">{Jensen, Jan H.}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">journal</span>=<span style="color:#e6db74">{Chemical Science}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">volume</span>=<span style="color:#e6db74">{10}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">number</span>=<span style="color:#e6db74">{12}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">pages</span>=<span style="color:#e6db74">{3567--3572}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">year</span>=<span style="color:#e6db74">{2019}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">publisher</span>=<span style="color:#e6db74">{Royal Society of Chemistry}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">doi</span>=<span style="color:#e6db74">{10.1039/c8sc05372c}</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>]]></content:encoded></item><item><title>Genetic Algorithms as Baselines for Molecule Generation</title><link>https://hunterheidenreich.com/notes/chemistry/molecular-design/generation/search-based/genetic-algorithms-molecule-generation-baselines/</link><pubDate>Mon, 23 Mar 2026 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/notes/chemistry/molecular-design/generation/search-based/genetic-algorithms-molecule-generation-baselines/</guid><description>Genetic algorithms outperform many deep learning methods for molecule generation. Tripp and Hernández-Lobato propose the GA criterion.</description><content:encoded><![CDATA[<h2 id="a-position-paper-on-molecular-generation-baselines">A Position Paper on Molecular Generation Baselines</h2>
<p>This is a <strong>Position</strong> paper that argues genetic algorithms (GAs) are underused and underappreciated as baselines in the molecular generation community. The primary contribution is empirical evidence that a simple GA implementation (MOL_GA) matches or outperforms many sophisticated deep learning methods on standard benchmarks. The authors propose the &ldquo;GA criterion&rdquo; as a minimum bar for evaluating new molecular generation algorithms.</p>
<h2 id="why-molecular-generation-may-be-easier-than-assumed">Why Molecular Generation May Be Easier Than Assumed</h2>
<p>Drug discovery is fundamentally a molecular generation task, and many machine learning methods have been proposed for it (Du et al., 2022). The problem has many variants, from unconditional generation of novel molecules to directed optimization of specific molecular properties.</p>
<p>The authors observe that generating valid molecules is, in some respects, straightforward. The rules governing molecular validity are well-defined bond constraints that can be checked using standard cheminformatics software like <a href="https://en.wikipedia.org/wiki/RDKit">RDKit</a>. This means new molecules can be generated simply by adding, removing, or substituting fragments of known molecules. When applied iteratively, this is exactly what a genetic algorithm does. Despite this, many papers in the field propose complex deep learning methods without adequately comparing to simple GA baselines.</p>
<h2 id="the-ga-criterion-for-evaluating-new-methods">The GA Criterion for Evaluating New Methods</h2>
<p>The core proposal is the <strong>GA criterion</strong>: new methods in molecular generation should offer some clear advantage over genetic algorithms. This advantage can be:</p>
<ul>
<li><strong>Empirical</strong>: outperforming GAs on relevant benchmarks</li>
<li><strong>Conceptual</strong>: identifying and overcoming a specific limitation of randomly modifying known molecules</li>
</ul>
<p>The authors argue that the current state of molecular generation research reflects poor empirical practices, where comprehensive baseline evaluation is treated as optional rather than essential.</p>
<h2 id="genetic-algorithm-framework-and-benchmark-experiments">Genetic Algorithm Framework and Benchmark Experiments</h2>
<h3 id="how-genetic-algorithms-work-for-molecules">How Genetic Algorithms Work for Molecules</h3>
<p>GAs operate through the following iterative procedure:</p>
<ol>
<li>Start with an initial population $P$ of molecules</li>
<li>Sample a subset $S \subseteq P$ from the population (possibly biased toward better molecules)</li>
<li>Generate new molecules $N$ from $S$ via mutation and crossover operations</li>
<li>Select a new population $P&rsquo;$ from $P \cup N$ (e.g., keep the highest-scoring molecules)</li>
<li>Set $P \leftarrow P&rsquo;$ and repeat from step 2</li>
</ol>
<p>The MOL_GA implementation uses:</p>
<ul>
<li><strong>Quantile-based sampling</strong> (step 2): molecules are sampled from the top quantiles of the population using a log-uniform distribution over quantile thresholds:</li>
</ul>
<p>$$
u \sim \mathcal{U}[-3, 0], \quad \epsilon = 10^{u}
$$</p>
<p>A molecule is drawn uniformly from the top $\epsilon$ fraction of the population.</p>
<ul>
<li><strong>Mutation and crossover</strong> (step 3): graph-based operations from <a href="/notes/chemistry/molecular-design/generation/search-based/graph-based-genetic-algorithm-chemical-space/">Jensen (2019)</a>, as implemented in the <a href="/notes/chemistry/molecular-design/generation/evaluation/guacamol-benchmarking-de-novo-molecular-design/">GuacaMol benchmark (Brown et al., 2019)</a></li>
<li><strong>Greedy population selection</strong> (step 4): molecules with the highest scores are retained</li>
</ul>
<h3 id="unconditional-generation-on-zinc-250k">Unconditional Generation on ZINC 250K</h3>
<p>The first experiment evaluates unconditional molecule generation, where the task is to produce novel, valid, and unique molecules distinct from a reference set (ZINC 250K). Success is measured by validity, novelty (at 10,000 generated molecules), and uniqueness.</p>
<table>
  <thead>
      <tr>
          <th>Method</th>
          <th>Paper</th>
          <th>Validity</th>
          <th>Novelty@10k</th>
          <th>Uniqueness</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>JT-VAE</td>
          <td>Jin et al. (2018)</td>
          <td>99.8%</td>
          <td>100%</td>
          <td>100%</td>
      </tr>
      <tr>
          <td>GCPN</td>
          <td>You et al. (2018)</td>
          <td>100%</td>
          <td>100%</td>
          <td>99.97%</td>
      </tr>
      <tr>
          <td><a href="/notes/chemistry/molecular-design/generation/rl-tuned/molecularrnn-graph-generation-optimized-properties/">MolecularRNN</a></td>
          <td>Popova et al. (2019)</td>
          <td>100%</td>
          <td>100%</td>
          <td>99.89%</td>
      </tr>
      <tr>
          <td>Graph NVP</td>
          <td>Madhawa et al. (2019)</td>
          <td>100%</td>
          <td>100%</td>
          <td>94.80%</td>
      </tr>
      <tr>
          <td>Graph AF</td>
          <td>Shi et al. (2020)</td>
          <td>100%</td>
          <td>100%</td>
          <td>99.10%</td>
      </tr>
      <tr>
          <td>MoFlow</td>
          <td>Zang and Wang (2020)</td>
          <td>100%</td>
          <td>100%</td>
          <td>99.99%</td>
      </tr>
      <tr>
          <td>GraphCNF</td>
          <td>Lippe and Gavves (2020)</td>
          <td>96.35%</td>
          <td>99.98%</td>
          <td>99.98%</td>
      </tr>
      <tr>
          <td>Graph DF</td>
          <td>Luo et al. (2021)</td>
          <td>100%</td>
          <td>100%</td>
          <td>99.16%</td>
      </tr>
      <tr>
          <td>ModFlow</td>
          <td>Verma et al. (2022)</td>
          <td>98.1%</td>
          <td>100%</td>
          <td>99.3%</td>
      </tr>
      <tr>
          <td>GraphEBM</td>
          <td>Liu et al. (2021)</td>
          <td>99.96%</td>
          <td>100%</td>
          <td>98.79%</td>
      </tr>
      <tr>
          <td>AddCarbon</td>
          <td>Renz et al. (2019)</td>
          <td>100%</td>
          <td>99.94%</td>
          <td>99.86%</td>
      </tr>
      <tr>
          <td>MOL_GA</td>
          <td>(this paper)</td>
          <td>99.76%</td>
          <td>99.94%</td>
          <td>98.60%</td>
      </tr>
  </tbody>
</table>
<p>All methods perform near 100% on all metrics, demonstrating that unconditional molecule generation is not a particularly discriminative benchmark. The authors note that generation speed (molecules per second) is an important missing dimension from these comparisons, where simple methods like GAs have a clear advantage.</p>
<h3 id="molecule-optimization-on-the-pmo-benchmark">Molecule Optimization on the PMO Benchmark</h3>
<p>The second experiment evaluates directed molecule optimization on the <a href="/notes/chemistry/molecular-design/generation/evaluation/pmo-sample-efficient-molecular-optimization/">Practical Molecular Optimization (PMO) benchmark (Gao et al., 2022)</a>, which measures the ability to find molecules optimizing a scalar objective function $f: \mathcal{M} \mapsto \mathbb{R}$ with a budget of 10,000 evaluations.</p>
<p>A key insight is that previous GA implementations in PMO used large generation sizes ($\approx 100$), which limits the number of improvement iterations. The authors set the generation size to 5, allowing approximately 2,000 iterations of improvement within the same evaluation budget.</p>
<table>
  <thead>
      <tr>
          <th>Task</th>
          <th><a href="/notes/chemistry/molecular-design/generation/rl-tuned/reinvent-deep-rl-molecular-design/">REINVENT</a></th>
          <th>Graph GA</th>
          <th>MOL_GA</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>albuterol_similarity</td>
          <td>0.882 +/- 0.006</td>
          <td>0.838 +/- 0.016</td>
          <td><strong>0.896 +/- 0.035</strong></td>
      </tr>
      <tr>
          <td>amlodipine_mpo</td>
          <td>0.635 +/- 0.035</td>
          <td>0.661 +/- 0.020</td>
          <td><strong>0.688 +/- 0.039</strong></td>
      </tr>
      <tr>
          <td>celecoxib_rediscovery</td>
          <td><strong>0.713 +/- 0.067</strong></td>
          <td>0.630 +/- 0.097</td>
          <td>0.567 +/- 0.083</td>
      </tr>
      <tr>
          <td>drd2</td>
          <td>0.945 +/- 0.007</td>
          <td><strong>0.964 +/- 0.012</strong></td>
          <td>0.936 +/- 0.016</td>
      </tr>
      <tr>
          <td>fexofenadine_mpo</td>
          <td>0.784 +/- 0.006</td>
          <td>0.760 +/- 0.011</td>
          <td><strong>0.825 +/- 0.019</strong></td>
      </tr>
      <tr>
          <td>isomers_c9h10n2o2pf2cl</td>
          <td>0.642 +/- 0.054</td>
          <td>0.719 +/- 0.047</td>
          <td><strong>0.865 +/- 0.012</strong></td>
      </tr>
      <tr>
          <td>sitagliptin_mpo</td>
          <td>0.021 +/- 0.003</td>
          <td>0.433 +/- 0.075</td>
          <td><strong>0.582 +/- 0.040</strong></td>
      </tr>
      <tr>
          <td>zaleplon_mpo</td>
          <td>0.358 +/- 0.062</td>
          <td>0.346 +/- 0.032</td>
          <td><strong>0.519 +/- 0.029</strong></td>
      </tr>
      <tr>
          <td><strong>Sum (23 tasks)</strong></td>
          <td>14.196</td>
          <td>13.751</td>
          <td><strong>14.708</strong></td>
      </tr>
      <tr>
          <td><strong>Rank</strong></td>
          <td>2</td>
          <td>3</td>
          <td><strong>1</strong></td>
      </tr>
  </tbody>
</table>
<p>MOL_GA achieves the highest aggregate score across all 23 PMO tasks, outperforming both the previous best GA (Graph GA) and the previous best overall method (REINVENT). The authors attribute this partly to the tuning of the baselines in PMO rather than MOL_GA being an especially strong method, since MOL_GA is essentially the same algorithm as Graph GA with different hyperparameters.</p>
<h2 id="implications-for-molecular-generation-research">Implications for Molecular Generation Research</h2>
<p>The key findings and arguments are:</p>
<ol>
<li>
<p><strong>GAs match or outperform deep learning methods</strong> on standard molecular generation benchmarks, both for unconditional generation and directed optimization.</p>
</li>
<li>
<p><strong>Hyperparameter choices matter significantly</strong>: MOL_GA&rsquo;s strong performance on PMO comes partly from using a smaller generation size (5 vs. ~100), which allows more iterations of refinement within the same evaluation budget.</p>
</li>
<li>
<p><strong>The GA criterion should be enforced in peer review</strong>: new molecular generation methods should demonstrate a clear advantage over GAs, whether empirical or conceptual.</p>
</li>
<li>
<p><strong>Deep learning methods may implicitly do what GAs do explicitly</strong>: many generative models are trained on datasets of known molecules, so the novel molecules they produce may simply be variants of their training data. The authors consider this an important direction for future investigation.</p>
</li>
<li>
<p><strong>Poor empirical practices are widespread</strong>: the paper argues that many experiments in molecule generation are conducted with an explicit desired outcome (that the novel algorithm is the best), leading to inadequate baseline comparisons.</p>
</li>
</ol>
<p>The authors are careful to note that this result should not be interpreted as GAs being exceptional algorithms. Rather, it is an indication that more complex methods have made surprisingly little progress beyond what simple heuristic search can achieve.</p>
<hr>
<h2 id="reproducibility-details">Reproducibility Details</h2>
<h3 id="data">Data</h3>
<table>
  <thead>
      <tr>
          <th>Purpose</th>
          <th>Dataset</th>
          <th>Size</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Unconditional generation</td>
          <td>ZINC 250K</td>
          <td>250,000 molecules</td>
          <td>Reference set for novelty evaluation</td>
      </tr>
      <tr>
          <td>Directed optimization</td>
          <td>PMO benchmark</td>
          <td>23 tasks</td>
          <td>10,000 evaluation budget per task</td>
      </tr>
  </tbody>
</table>
<h3 id="algorithms">Algorithms</h3>
<ul>
<li><strong>GA implementation</strong>: MOL_GA package, using graph-based mutation and crossover from Jensen (2019) via the GuacaMol implementation</li>
<li><strong>Generation size</strong>: 5 molecules per iteration (allowing ~2,000 iterations with 10,000 evaluations)</li>
<li><strong>Population selection</strong>: Greedy (highest-scoring molecules retained)</li>
<li><strong>Sampling</strong>: Quantile-based with log-uniform distribution over quantile thresholds</li>
</ul>
<h3 id="evaluation">Evaluation</h3>
<table>
  <thead>
      <tr>
          <th>Metric</th>
          <th>Benchmark</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Validity, Novelty@10k, Uniqueness</td>
          <td>ZINC 250K unconditional</td>
          <td>Calculated using <a href="/notes/chemistry/molecular-design/generation/evaluation/molecular-sets-moses/">MOSES package</a></td>
      </tr>
      <tr>
          <td>AUC top-10 scores</td>
          <td>PMO benchmark</td>
          <td>23 optimization tasks with 10,000 evaluation budget</td>
      </tr>
  </tbody>
</table>
<h3 id="hardware">Hardware</h3>
<p>The paper does not specify hardware requirements. Given that GAs are computationally lightweight compared to deep learning methods, standard CPU hardware is likely sufficient.</p>
<h3 id="artifacts">Artifacts</h3>
<table>
  <thead>
      <tr>
          <th>Artifact</th>
          <th>Type</th>
          <th>License</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="https://github.com/AustinT/mol_ga">MOL_GA</a></td>
          <td>Code</td>
          <td>MIT</td>
          <td>Python package for molecular genetic algorithms</td>
      </tr>
      <tr>
          <td><a href="https://pypi.org/project/mol-ga/">MOL_GA on PyPI</a></td>
          <td>Code</td>
          <td>MIT</td>
          <td>pip-installable package</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="paper-information">Paper Information</h2>
<p><strong>Citation</strong>: Tripp, A., &amp; Hernández-Lobato, J. M. (2023). Genetic algorithms are strong baselines for molecule generation. <em>arXiv preprint arXiv:2310.09267</em>. <a href="https://arxiv.org/abs/2310.09267">https://arxiv.org/abs/2310.09267</a></p>
<p><strong>Publication</strong>: arXiv preprint, 2023</p>
<p><strong>Additional Resources</strong>:</p>
<ul>
<li><a href="https://github.com/AustinT/mol_ga">MOL_GA Python Package (GitHub)</a></li>
<li><a href="https://pypi.org/project/mol-ga/">MOL_GA on PyPI</a></li>
</ul>
<h2 id="citation">Citation</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@article</span>{tripp2023genetic,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">title</span>=<span style="color:#e6db74">{Genetic algorithms are strong baselines for molecule generation}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">author</span>=<span style="color:#e6db74">{Tripp, Austin and Hern{\&#39;a}ndez-Lobato, Jos{\&#39;e} Miguel}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">journal</span>=<span style="color:#e6db74">{arXiv preprint arXiv:2310.09267}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">year</span>=<span style="color:#e6db74">{2023}</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>]]></content:encoded></item></channel></rss>