STONED: Training-Free Molecular Design with SELFIES

A Training-Free Algorithm for Molecular Generation

This is a Method paper that introduces STONED (Superfast Traversal, Optimization, Novelty, Exploration and Discovery), a suite of algorithms for molecular generation and chemical space exploration. STONED operates entirely through string manipulations on the SELFIES molecular representation, avoiding the need for deep learning models, training data, or GPU resources. The key claim is that simple character-level mutations and interpolations in SELFIES can achieve results competitive with state-of-the-art deep generative models on standard benchmarks.

Why Deep Generative Models May Be Overkill

Deep generative models (VAEs, GANs, RNNs, reinforcement learning) have become popular for inverse molecular design, but they come with practical costs: large training datasets, expensive GPU compute, and long training times. Fragile representations like SMILES compound the problem, since large portions of a latent space can map to invalid molecules. Even with the introduction of SELFIES (a 100% valid string representation), prior work still embedded it within neural network architectures.

The authors argue that for tasks like local chemical space exploration and molecular interpolation, the guarantees of SELFIES alone may be sufficient. Because every SELFIES string maps to a valid molecule, random character mutations always produce valid structures. This observation eliminates the need for learned generation procedures entirely.

Core Innovation: SELFIES String Mutations as Molecular Operators

STONED relies on four key techniques built on SELFIES string manipulations:

1. Random character mutations. A point mutation in SELFIES (character replacement, deletion, or addition) always yields a valid molecule. The position of mutations serves as a hyperparameter controlling exploration vs. exploitation: terminal character mutations preserve more structural similarity to the seed, while random mutations explore more broadly.

2. Multiple SMILES orderings. A single molecule has many valid SMILES strings, each mapping to a different SELFIES. By generating 50,000 SMILES orderings and converting to SELFIES before mutation, the diversity of generated structures increases substantially.

3. Deterministic interpolation. Given two SELFIES strings (padded to equal length), characters at equivalent positions can be successively replaced from the start molecule to the target molecule. Every intermediate string is a valid molecule. A chemical path is extracted by keeping only those intermediates that increase fingerprint similarity to the target.

4. Fingerprint-based filtering. Since edit distance in SELFIES does not reflect molecular similarity, STONED uses fingerprint comparisons (ECFP4, FCFP4, atom-pair) to enforce structural similarity constraints.

The authors also propose a revised joint molecular similarity metric for evaluating median molecules. Given $n$ reference molecules $M = {m_1, m_2, \ldots, m_n}$, the joint similarity of a candidate molecule $m$ is:

$$ F(m) = \frac{1}{n} \sum_{i=1}^{n} \text{sim}(m_i, m) - \left[\max_{i} \text{sim}(m_i, m) - \min_{i} \text{sim}(m_i, m)\right] $$

This penalizes candidates that are similar to only a subset of references, unlike the geometric mean metric used in GuacaMol which can yield high scores even with lopsided similarities.

Experimental Setup and Applications

Local chemical subspace formation

Starting from a single seed molecule (aripiprazole, albuterol, mestranol, or celecoxib), the algorithm generates 50,000 SMILES orderings and performs 1-5 point mutations per ordering, producing 250,000 candidate strings. Unique valid molecules are filtered by fingerprint similarity thresholds.

Starting structure	Fingerprint	Molecules at $\delta > 0.75$	Molecules at $\delta > 0.60$	Molecules at $\delta > 0.40$
Aripiprazole (SELFIES, random)	ECFP4	513 (0.25%)	4,206 (2.15%)	34,416 (17.66%)
Albuterol (SELFIES, random)	FCFP4	587 (0.32%)	4,156 (2.33%)	16,977 (9.35%)
Mestranol (SELFIES, random)	AP	478 (0.22%)	4,079 (1.90%)	45,594 (21.66%)
Celecoxib (SELFIES, random)	ECFP4	198 (0.10%)	1,925 (1.00%)	18,045 (9.44%)
Celecoxib (SELFIES, terminal 10%)	ECFP4	864 (2.02%)	9,407 (21.99%)	34,187 (79.91%)

Key finding: restricting mutations to terminal characters yields a 20x increase in high-similarity molecules compared to random positions. Compared to SMILES mutations (0.30% valid) and DeepSMILES (1.44% valid), SELFIES mutations are all valid by construction.

A two-step expansion (mutating all unique first-round neighbors) produced over 17 million unique molecules, with 120,000 having similarity greater than 0.4 to celecoxib.

Chemical path formation and drug design

Deterministic SELFIES interpolation between tadalafil and sildenafil generated paths where logP and QED values varied smoothly. A more challenging application docked intermediates between dihydroergotamine (5-HT1B binder) and prinomastat (CYP2D6 binder), finding molecules with non-trivial binding affinity to both proteins without any optimization routine.

Median molecules for photovoltaics

Using 100 triplets from the Harvard Clean Energy (HCE) dataset, each with one molecule optimized for high LUMO energy, one for high dipole moment, and one for high HOMO-LUMO gap, generalized chemical paths produced median molecules. These were evaluated with GFN2-xTB semiempirical calculations. The generated medians matched or exceeded the best molecules available in the HCE database in both structural similarity and target properties.

GuacaMol benchmarks

Without any training, STONED achieved an overall GuacaMol score of 14.70, competitive with several deep generative models. The approach simply identifies the single best molecule in the benchmark’s training set and generates its local chemical subspace. 38% of the top-100 molecules from each benchmark passed compound quality filters, comparable to Graph GA and SMILES GA.

Results Summary and Limitations

STONED demonstrates that SELFIES string mutations can match or approach deep generative models on standard molecular design benchmarks while being orders of magnitude faster and requiring no training. The most expensive benchmark (aripiprazole subspace) completed in 500 seconds on a laptop CPU.

The method comparison table from the paper highlights STONED’s unique position:

Feature	Expert Systems	VAE	GAN	RL	STONED
Expert rule-free	No	Yes	Yes	Yes	Yes
Structure coverage	Partial	Partial	Partial	Partial	Yes
Interpolatability	No	Yes	Yes	No	Yes
Property-based navigation	Partial	Yes	Yes	Yes	Partial
Training-free	Yes	No	No	No	Yes
Data independence	Yes	No	No	No	Yes

Limitations acknowledged by the authors:

STONED lacks property-based navigation (gradient-guided optimization toward specific property targets). It can only do stochastic property optimization when wrapped in a genetic algorithm.
The success rate of mutations leading to structurally similar molecules is relatively low (0.1-2% at high similarity thresholds), though speed compensates.
Chemical paths can contain molecules with unstable functional groups or tautomerization issues, requiring post-hoc filtering with domain-specific rules.
Fingerprint similarity does not capture all aspects of chemical similarity (3D geometry, reactivity, synthesizability).
The penalized logP and QED benchmarks used by GuacaMol do not represent the full complexity of practical molecular design.

Reproducibility Details

Data

Purpose	Dataset	Size	Notes
Photovoltaics	Harvard Clean Energy (HCE) database	~2.3M molecules	Used for median molecule triplet experiments
Benchmarking	GuacaMol benchmark suite	Varies per task	Standard benchmarks for generative molecular design
Comparison	ChEMBL (SCScore <= 2.5 subset)	Fragment database	Used for CReM comparison experiments

Algorithms

Local subspace formation: 50,000 SMILES orderings per seed molecule, 1-5 SELFIES point mutations each, totaling 250,000 candidates per experiment.
Chemical paths: Deterministic character-by-character interpolation between padded SELFIES strings, with monotonic fingerprint similarity filtering.
Median molecules: Generalized paths between 3+ reference molecules using 10,000 paths per triplet with randomized SMILES orderings.
Docking: SMINA with crystal structures from PDB (4IAQ for 5-HT1B, 3QM4 for CYP2D6). Top-5 binding poses averaged.
Quantum chemistry: GFN2-xTB for dipole moments, LUMO energies, and HOMO-LUMO gaps.

Evaluation

Metric	Value	Baseline	Notes
GuacaMol overall score	14.70	Varies by model	Competitive with deep generative models
Quality filter pass rate	38%	Graph GA/SMILES GA comparable	Top-100 molecules per benchmark
Celecoxib neighbors ($\delta > 0.75$)	198-864	CReM: 239	Depends on mutation position strategy

Hardware

All experiments run on a laptop with Intel i7-8750H CPU at 2.20 GHz. No GPU required. Most expensive single experiment (aripiprazole subspace) completed in 500 seconds.

Artifacts

Artifact	Type	License	Notes
stoned-selfies	Code	Not specified	Official implementation of STONED algorithms

Paper Information

Citation: Nigam, A. K., Pollice, R., Krenn, M., dos Passos Gomes, G., & Aspuru-Guzik, A. (2021). Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (STONED) algorithm for molecules using SELFIES. Chemical Science, 12(20), 7079-7090. https://doi.org/10.1039/d1sc00231g

@article{nigam2021stoned,
  title={Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery ({STONED}) algorithm for molecules using {SELFIES}},
  author={Nigam, AkshatKumar and Pollice, Robert and Krenn, Mario and dos Passos Gomes, Gabriel and Aspuru-Guzik, Al{\'a}n},
  journal={Chemical Science},
  volume={12},
  number={20},
  pages={7079--7090},
  year={2021},
  publisher={Royal Society of Chemistry},
  doi={10.1039/d1sc00231g}
}

A Training-Free Algorithm for Molecular Generation#

Why Deep Generative Models May Be Overkill#

Core Innovation: SELFIES String Mutations as Molecular Operators#

Experimental Setup and Applications#

Local chemical subspace formation#

Chemical path formation and drug design#

Median molecules for photovoltaics#

GuacaMol benchmarks#

Results Summary and Limitations#

Reproducibility Details#

Data#

Algorithms#

Evaluation#

Hardware#

Artifacts#

Paper Information#