GDB-13

GDB-13: Chemical Universe Database (970M Molecules)
Dataset Details
AuthorsLorenz C. Blum, Jean-Louis Reymond
Paper Title970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13
InstitutionUniversity of Berne
Published InJournal of the American Chemical Society
CategoryComputational Chemistry
FormatSMILES
SizeMolecules: 977,468,314
DateAugust 2025
Year2009
Links📊 Dataset🔗 DOI📄 Paper
GDB-13 molecule structure showing CCCC(O)(CO)CC1CC1CN
Example GDB-13 molecule demonstrating the expanded chemical space with up to 13 atoms

Key Contribution

The main contribution is the creation and release of the 977.4 million-compound GDB-13, a massive expansion in molecular size (up to 13 atoms) and elemental diversity (including S and Cl) made possible by key algorithmic optimizations that significantly accelerated the enumeration process.

Dataset Information

Format

SMILES

Size

TypeCount
Molecules977,468,314

Dataset Examples

Example GDB-13 molecule (SMILES: CCCC(O)(CO)CC1CC1CN)
Example GDB-13 molecule (SMILES: CCCC(O)(CO)CC1CC1CN)

Strengths

  • Systematic coverage of structures with up to 13 atoms
  • High drug-likeness: 100% Lipinski compliance
  • Structural novelty

Limitations

  • Limited to small molecules with up to 13 atoms of C, N, O, S, and Cl
  • Excludes highly strained molecules and some bond patterns
  • Excludes functional groups and highly polar molecules
  • Computer-generated structures, not experimentally validated compounds

Technical Notes

Differences from GDB-11

  • A fast elemental filter is used to auto reject structures, informed by analysis of extant molecular databases
  • Fluorine is removed from the list of allowed elements; sulfur and chlorine are added
  • MM2-based structure optimization is replaced with much faster geometry-based optimization

Related Datasets

DatasetRelationshipLink
GDB-11Predecessor📄 View Details
GDB-17Successor📄 View Details