Benchmark suites, scoring frameworks, evaluation studies, and surveys of the molecular generation field.
Benchmark Suites & Scoring
| Paper | Year | Key Idea |
|---|---|---|
| GuacaMol | 2019 | Distribution-learning and goal-directed generation benchmarks |
| MOSES | 2020 | Distribution-learning benchmark with curated ZINC subset and distributional metrics |
| FCD | 2018 | Adapts FID from image generation to molecules using learned chemical embeddings |
| PMO | 2022 | Sample-efficient molecular optimization comparing 25 methods under fixed oracle budget |
| MolScore | 2024 | Unified scoring framework wrapping objectives from GuacaMol, MOSES, and others |
| Tartarus | 2023 | Realistic inverse design benchmarks using physics-based oracles (DFT, xTB) |
| SPECTRA | 2025 | Out-of-domain generalizability evaluation via spectral analysis |
| MolGenBench | 2025 | Evaluation across distribution learning, property optimization, and constrained optimization |
Docking Benchmarks
| Paper | Year | Key Idea |
|---|---|---|
| DOCKSTRING | 2022 | Docking-based benchmarks for ligand design with precomputed scores |
| SMINA Benchmark | 2023 | SMINA docking evaluation on realistic binding tasks |
Failure Analysis & Tools
| Paper | Year | Key Idea |
|---|---|---|
| Failure Modes | 2019 | Trivial models fool distribution-learning metrics; ML scoring functions have exploitable biases |
| Sample Efficiency | 2022 | Property filters and diversity metrics substantially re-rank model performance |
| Avoiding Failure Modes | 2022 | Apparent failures stem from QSAR model disagreement, not algorithmic exploitation |
| UnCorrupt SMILES | 2023 | Transformer-based corrector recovers 60-95% of invalid generator outputs |
Surveys & Reviews
| Paper | Year | Key Idea |
|---|---|---|
| Deep Learning for Molecular Design | 2019 | Survey of RNNs, VAEs, GANs, and RL approaches with SMILES and graph representations |
| CLMs for De Novo Drug Design | 2023 | Review of chemical language models covering architectures and training strategies |
| Inverse Molecular Design | 2022 | Review of VAE, GAN, and RL approaches for navigating chemical space |
| RNNs vs Transformers | 2023 | Empirical comparison of RNN and Transformer architectures for molecular generation |
| MolGenSurvey | 2022 | Survey across 1D string, 2D graph, and 3D geometry representations |
| Generative AI Drug Design | 2024 | Comprehensive survey covering VAEs, GANs, diffusion, and flow models |











