Computational Chemistry
Bar chart comparing docking scores of generated vs known ligands for CDK2 and EGFR targets

Protein-to-Drug Molecule Translation via Transformer

Applies the Transformer architecture to generate drug-like molecules conditioned on protein amino acid sequences, treating target-specific de novo drug design as a sequence-to-sequence translation problem.

Computational Chemistry
Bar chart comparing PMO benchmark scores with and without chemical quality filters across five generative methods

Re-evaluating Sample Efficiency in Molecule Generation

A critical reassessment of the PMO benchmark for de novo molecule generation, showing that adding molecular weight, LogP, and diversity filters substantially re-ranks generative models, with Augmented Hill-Climb emerging as the top method.

Computational Chemistry
Bar chart showing deep generative architecture types for molecular design: RNN, VAE, GAN, RL, and hybrid methods

Review: Deep Learning for Molecular Design (2019)

An early and influential review cataloging 45 papers on deep generative modeling for molecules, comparing RNN, VAE, GAN, and reinforcement learning architectures across SMILES and graph-based representations.

Computational Chemistry
Diagram showing the dual formulation of S4 models with convolution during training and recurrence during generation for SMILES-based molecular design

S4 Structured State Space Models for De Novo Drug Design

This paper introduces structured state space sequence (S4) models to chemical language modeling, showing they combine the strengths of LSTMs (efficient recurrent generation) and GPTs (holistic sequence learning) for de novo molecular design.

Computational Chemistry
Bar chart showing CLM architecture publication trends from 2020 to 2024, with transformers overtaking RNNs

Systematic Review of Deep Learning CLMs (2020-2024)

PRISMA-based systematic review of 72 papers on chemical language models for molecular generation, comparing architectures and biased methods using MOSES metrics.

Computational Chemistry
Taxonomy of transformer-based chemical language models organized by architecture type

Transformer CLMs for SMILES: Literature Review 2024

A comprehensive review of transformer-based chemical language models operating on SMILES, categorizing encoder-only (BERT variants), decoder-only (GPT variants), and encoder-decoder models with analysis of tokenization strategies, pre-training approaches, and future directions.

Computational Chemistry
Stylized visualization of protein-ligand docking and benchmark performance bars across five drug targets

DOCKSTRING: Docking-Based Benchmarks for Drug Design

DOCKSTRING bundles an AutoDock Vina wrapper, a 260K-molecule docking dataset across 58 protein targets, and pharmaceutically relevant benchmarks for regression, virtual screening, and de novo design.

Computational Chemistry
Grid of six GuacaMol benchmark target molecules: Celecoxib, Troglitazone, Thiothixene, Aripiprazole, Osimertinib, and Sitagliptin

GuacaMol: Benchmarking Models for De Novo Molecular Design

GuacaMol provides an open-source benchmarking framework with 5 distribution-learning and 20 goal-directed tasks to standardize evaluation of de novo molecular design models.

Computational Chemistry
Overview of MoleculeNet dataset categories and task counts across quantum mechanics, physical chemistry, biophysics, and physiology

MoleculeNet: Benchmarking Molecular Machine Learning

MoleculeNet introduces a large-scale benchmark suite for molecular machine learning, curating over 700,000 compounds across 17 datasets with standardized metrics, data splits, and featurization methods integrated into the DeepChem open-source library.

Computational Chemistry
Bar chart comparing molecular generative model performance across six evaluation dimensions including validity, safety, and hit rates

MolGenBench: Benchmarking Molecular Generative Models

MolGenBench introduces a comprehensive benchmark for evaluating molecular generative models in realistic drug discovery settings, spanning de novo design and hit-to-lead optimization across 120 protein targets with 220,005 experimentally validated actives.

Computational Chemistry
Diagram showing MolScore framework components: scoring functions, evaluation metrics, and benchmark modes

MolScore: Scoring and Benchmarking for Drug Design

MolScore is an open-source framework that unifies scoring functions, evaluation metrics, and benchmarks for generative molecular design, with configurable objectives and GUI support.

Computational Chemistry
Sample efficiency curves showing different molecular optimization algorithm families converging at different rates under a fixed oracle budget

PMO: Benchmarking Sample-Efficient Molecular Design

A large-scale benchmark of 25 molecular optimization methods on 23 oracles under constrained oracle budgets, showing that sample efficiency is a critical and often neglected dimension of evaluation.