Evaluation & Tools

This group covers papers that critically evaluate molecular generation methods or provide tools to improve their outputs.

Paper	Year	Focus	Key Finding
Failure Modes	2019	Benchmarking	Trivial models fool distribution-learning metrics; ML scoring functions have exploitable biases
Sample Efficiency	2022	Benchmarking	Property filters and diversity metrics substantially re-rank model performance
Avoiding Failure Modes	2022	Diagnostics	Apparent failures stem from QSAR model disagreement, not algorithmic exploitation
UnCorrupt SMILES	2023	Post-hoc tool	Transformer-based corrector recovers 60-95% of invalid generator outputs

All Notes

Computational Chemistry

Two-panel plot showing score divergence with disagreeing classifiers vs convergence with agreeing classifiers

Avoiding Failure Modes in Goal-Directed Generation

Shows that divergence between optimization and control scores during goal-directed molecular generation is explained by pre-existing disagreement among QSAR models on the training distribution, not by algorithmic exploitation of model-specific biases.

Computational Chemistry

Bar chart comparing PMO benchmark scores with and without chemical quality filters across five generative methods

Re-evaluating Sample Efficiency in Molecule Generation

A critical reassessment of the PMO benchmark for de novo molecule generation, showing that adding molecular weight, LogP, and diversity filters substantially re-ranks generative models, with Augmented Hill-Climb emerging as the top method.

Computational Chemistry

Diagram showing divergence between optimization score and control scores during molecular optimization

Failure Modes in Molecule Generation & Optimization

Identifies failure modes in molecular generative models, showing that trivial edits fool distribution-learning benchmarks and that ML-based scoring functions introduce exploitable model-specific and data-specific biases during goal-directed optimization.

Computational Chemistry

Diagram showing the UnCorrupt SMILES pipeline: invalid SMILES are corrected by a transformer seq2seq model into valid SMILES, with correction rates of 62-95% across generator types

UnCorrupt SMILES: Post Hoc Correction for De Novo Design

This paper trains a transformer model to correct invalid SMILES produced by de novo molecular generators (RNN, VAE, GAN). The corrector fixes 60-95% of invalid outputs, and the fixed molecules are comparable in novelty and similarity to valid generator outputs. The approach also enables local chemical space exploration by introducing and correcting errors in existing molecules.

All Notes#

All Notes