Drug-Design

Three panels comparing sampling strategies in a multi-modal fitness landscape: exhaustive enumeration, genetic algorithm clustering around few peaks, and ACSESS covering all peaks with fewer evaluations

ACSESS: Diverse Optimal Molecules in the SMU

Property-optimizing ACSESS combines diversity-biased sampling with iterative fitness thresholding to discover diverse sets of molecules with favorable properties. Tested on GDB-9 (dipole moment optimization) and NKp fitness landscapes, it outperforms standard genetic algorithms in diversity while matching or exceeding their fitness, using only ~30,000 evaluations to navigate a 300,000-molecule space.

Predictive Chemistry

Diagram showing AllChem's combinatorial synthon assembly pipeline: 7,000 building blocks transformed by 100 reactions into 5 million synthons, which combine in A-B-C topology to represent 10^20 structures

AllChem: Generating and Searching 10^20 Structures

AllChem generates ~5 million synthons by recursively applying ~100 reactions to ~7,000 building blocks, combinatorially representing up to 10^20 complete structures with an A-B-C topology. Topomer shape similarity enables efficient searching of this space, and every hit comes with a proposed synthetic route.

Computational Chemistry

FDB-17 filtering pipeline from GDB-17 (166.4B) through fragment filters (4.6B) to even sampling (10M), with bar charts comparing size distribution and Fsp3 shape complexity against commercial fragments

FDB-17: Fragment Database (10M Molecules)

FDB-17 contains 10 million fragment-like molecules selected from GDB-17’s 166.4 billion entries. Fragment-likeness filters reduce GDB-17 by 36x to 4.6 billion molecules, then even sampling across (HAC, heteroatoms, stereocenters) triplets produces a 460x further reduction to a manageable, diverse library enriched in 3D-shaped molecules.

Computational Chemistry

GDBMedChem pipeline from GDB-17 through medicinal chemistry filters to 10M molecules, with Venn diagram showing 97% unique substructures and property comparison against known drugs

GDBMedChem: Drug-Like Subset of GDB-17 (10M Molecules)

GDBMedChem applies medicinal chemistry-inspired functional group and structural complexity filters to GDB-17, reducing 166.4 billion molecules to 17.8 billion, then evenly samples across molecular size, stereochemistry, and polarity to produce 10 million drug-like molecules. 97% of its substructures are absent from known molecule databases.

Predictive Chemistry

Six molecules with atoms colored by divalent (blue, simple) vs non-divalent (red, complex) nodes, showing increasing MC1 complexity from hexane to pivaloyl methylamine

Molecular Complexity from the GDB Chemical Space

Buehler and Reymond introduce two molecular complexity measures, MC1 (fraction of non-divalent nodes) and MC2 (count of non-divalent nodes excluding carboxyl groups), derived from analyzing synthesizability patterns in GDB-enumerated molecules. They compare these measures against existing complexity scores across GDB-13s, ZINC, ChEMBL, and COCONUT.

Predictive Chemistry

Grid of heteroaromatic ring systems rendered with RDKit, showing known ring systems in blue-tinted panels and predicted tractable rings in amber-tinted panels

VEHICLe: Heteroaromatic Rings of the Future

VEHICLe (Virtual Exploratory Heterocyclic Library) is a complete enumeration of 24,867 mono- and bicyclic heteroaromatic ring systems built from C, N, O, S, and H. Of these, only 1,701 have ever appeared in published compounds. A random forest classifier trained on known vs. unknown ring systems predicts that over 3,000 additional ring systems are synthetically tractable.

Computational Chemistry

ChatDrug pipeline from prompt design through ChatGPT to domain feedback and edited molecule output

ChatDrug: Conversational Drug Editing with ChatGPT

ChatDrug is a parameter-free framework that combines ChatGPT with retrieval-augmented domain feedback and iterative conversation to edit drugs across small molecules, peptides, and proteins.

Computational Chemistry

ChemCrow architecture with GPT-4 central planner connected to 18 chemistry tools via ReAct reasoning

ChemCrow: Augmenting LLMs with 18 Chemistry Tools

ChemCrow augments GPT-4 with 18 chemistry tools to autonomously plan and execute syntheses, discover novel chromophores, and solve diverse chemical reasoning tasks.

Computational Chemistry

DrugAssist workflow from user instruction through LoRA fine-tuned Llama2 to optimized molecule output

DrugAssist: Interactive LLM Molecule Optimization

DrugAssist fine-tunes Llama2-7B-Chat on over one million molecule pairs for interactive, dialogue-based molecule optimization across six molecular properties.

Computational Chemistry

DrugChat architecture showing GNN encoder, linear adaptor, and Vicuna LLM for conversational drug analysis

DrugChat: Conversational QA on Drug Molecule Graphs

DrugChat is a prototype system that bridges molecular graph neural networks with large language models for interactive, multi-turn question answering about drug compounds. It trains only a lightweight linear adaptor between a frozen GNN encoder and Vicuna-13B using 143K curated QA pairs from ChEMBL and PubChem.

Molecular Generation

Pareto front plot for multi-objective optimization alongside DrugEx v2 explorer-exploiter architecture

DrugEx v2: Pareto Multi-Objective RL for Drug Design

DrugEx v2 introduces Pareto-based multi-objective optimization and evolutionary exploration strategies into an RNN reinforcement learning framework for de novo drug design toward multiple protein targets.

Molecular Generation

LatentGAN pipeline from SMILES encoder through latent space WGAN-GP to SMILES decoder

LatentGAN: Latent-Space GAN for Molecular Generation

LatentGAN decouples molecular generation from SMILES syntax by training a Wasserstein GAN on latent vectors from a pretrained heteroencoder, enabling de novo design of drug-like and target-biased compounds.