GDB-17

GDB-17: Chemical Universe Database (166B Molecules)
Dataset Details
AuthorsLars Ruddigkeit, Ruud van Deursen, Lorenz C. Blum, Jean-Louis Reymond
Paper TitleEnumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17
InstitutionsUniversity of Berne, Ecole Polytechnique Fédérale de Lausanne
Published InJournal of Chemical Information and Modeling
CategoryComputational Chemistry
FormatSMILES
SizeMolecules: 166,443,860,262 (50 million subset available)
DateAugust 2025
Year2012
Links📊 Dataset🔗 DOI📄 Paper
GDB-17 molecule structure showing complex polycyclic architecture
Example GDB-17 molecule demonstrating the complex 3D diversity and polycyclic structures characteristic of the 166 billion molecule database

Key Contribution

The primary contribution is the creation of the 166.4 billion-compound GDB-17, which successfully extends the enumerated chemical universe into the drug-relevant size range of up to 17 atoms, made possible by a 400-fold faster algorithm that revealed a novel chemical space rich in three-dimensional and stereochemically complex structures.

Dataset Information

Format

SMILES

Size

TypeCount
Molecules166,443,860,262 (50 million subset available)

Dataset Examples

Example GDB-17 molecule (SMILES: C1CC2C3CCCC3C3(C4CCC3CC4)C2C1) demonstrating the complex polycyclic structures and 3D diversity characteristic of the database
Example GDB-17 molecule (SMILES: C1CC2C3CCCC3C3(C4CCC3CC4)C2C1) demonstrating the complex polycyclic structures and 3D diversity characteristic of the database

Strengths

  • Systematic coverage of structures
  • Structural novelty, especially 3D diversity
  • Significant diversity in scaffolds and ring systems

Limitations

  • Excludes P, Si, B and other drug-relevant elements
  • Excludes functional groups and highly polar molecules
  • Computer-generated structures, not experimentally validated compounds

Technical Notes

Differences from GDB-13

  • The generation algorithm was entirely rewritten for memory efficiency, resulting in a 400-fold increase in computing speed that enabled enumeration up to 17 atoms.
  • The scope of allowed elements was expanded to include all halogens (F, Cl, Br, I).
  • More aggressive, size-dependent graph selection filters were introduced to manage the combinatorial explosion, such as restricting or prohibiting small rings and complex bridgeheads in molecules with 14 or more atoms.
  • A multi-step post-processing stage was added to introduce specific functional groups (e.g., oximes, nitro groups, CF₃, sulfones) that were not generated during the main combinatorial step.
  • A new functional group filter was implemented to remove non-aromatic C=C bonds for molecules with 17 atoms, further controlling the output size.

Related Datasets

DatasetRelationshipLink
GDB-11Predecessor📄 View Details
GDB-13Predecessor📄 View Details