Benchmark Problems

Progress in computational chemistry depends on shared, reproducible evaluation targets. This section collects notes on benchmark problems and datasets used to assess new methods, from classic analytical potential energy surfaces like the Muller-Brown surface to standardized generative modeling platforms like MOSES. These resources matter because they define what “better” means in practice, and understanding their design choices is essential for interpreting results reported in the literature.

Computational Chemistry

Bar chart comparing SMINA docking scores of CVAE, GVAE, and REINVENT against a random ZINC 10% baseline across eight protein targets

SMINA Docking Benchmark for De Novo Drug Design Models

Proposes a benchmark for de novo drug design using SMINA docking scores across eight drug targets, revealing that popular generative models fail to outperform random ZINC subsets.

Computational Chemistry

2D structure of a phenyl-quaterthiophene, a conjugated organic molecule representative of the photovoltaic donor materials benchmarked in the Tartarus platform

Tartarus: Realistic Inverse Molecular Design Benchmarks

Tartarus introduces a modular suite of realistic molecular design benchmarks grounded in computational chemistry simulations. Benchmarking eight generative models reveals that no single algorithm dominates all tasks, and simple genetic algorithms often outperform deep generative models.

Computational Chemistry

Activity cliffs benchmark showing method rankings by RMSE on cliff compounds, with SVM plus ECFP outperforming deep learning approaches

Exposing Limitations of Molecular ML with Activity Cliffs

This paper benchmarks 24 machine and deep learning methods on activity cliff compounds (structurally similar molecules with large potency differences) across 30 macromolecular targets. Traditional ML with molecular fingerprints consistently outperforms graph neural networks and SMILES-based transformers on these challenging cases, especially in low-data regimes.

Computational Chemistry

Density plot showing training vs generated physicochemical property distribution

Molecular Sets (MOSES): A Generative Modeling Benchmark

MOSES introduces a comprehensive benchmarking platform for molecular generative models, offering standardized datasets, evaluation metrics, and baselines. By providing a unified measuring stick, it aims to resolve reproducibility challenges in chemical distribution learning.

Computational Chemistry

Müller-Brown Potential Energy Surface showing the three minima and two saddle points

The Müller-Brown Potential: A 2D Benchmark Surface

A two-dimensional analytical potential energy surface introduced in 1979 for testing optimization algorithms. It features three minima and curved transition pathways that evaluate an algorithm’s ability to navigate non-trivial topologies.