GDB-11

GDB-11: Chemical Universe Database (26.4M Molecules)
Dataset Details
AuthorsTobias Fink, Jean-Louis Reymond
Paper TitleVirtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, and F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physiochemical Properties, Compound Classes, and Drug Discovery
InstitutionUniversity of Berne
Published InJournal of Chemical Information and Modeling
CategoryComputational Chemistry
FormatSMILES
SizeMolecules: 26,434,571
Stereoisomers: 110,979,507
DateAugust 2025
Year2007
Links📊 Dataset🔗 DOI📄 Paper
GDB-11 molecule structure showing FC1C2OC1c3c(F)coc23
Example GDB-11 molecule demonstrating the systematic generation of small organic structures

Key Contribution

The generation and analysis of the Generated Database (GDB), an exhaustive collection of all possible small molecules that meet specific criteria for stability and synthetic feasibility.

Dataset Information

Format

SMILES

Size

TypeCount
Molecules26,434,571
Stereoisomers110,979,507

Dataset Examples

GDB-11 molecule (SMILES: FC1C2OC1c3c(F)coc23)
GDB-11 molecule (SMILES: FC1C2OC1c3c(F)coc23)

Strengths

  • Systematic coverage of structures with up to 11 atoms
  • High drug-likeness: 100% Lipinski compliance, 50% Rule of Three compliance
  • Structural novelty: 538 previously unknown ring systems

Limitations

  • Limited to small molecules with up to 11 atoms of C, N, O, and F
  • Excludes highly strained molecules and some bond patterns
  • Excludes functional groups: Doesn’t include unstable groups like hemiacetals and gem-diols
  • Computer-generated structures, not experimentally validated compounds

Technical Notes

Construction

Graph Selection

GENG to generate starting graphs resulting in 843,335 connected graphs with up to 11 nodes. Filtered using topological and steric criteria to 15,726 stable graphs.

Structure Generation

Graph symmetry algorithm used to identify valid locations for unsaturations and element types. Combinatorial expansion yielded 1.7 billion unique structures.

Filters

Filtering out heteroatom bonds, gem-diols, aminals, enols, orthoacids, acyl fluorides, and other labile functional groups, reduces to 27.7 million structures. Removal of redundant tautomeric forms yield 26.4 million structures.

Stereoisomer Generation

110.9 million stereoisomers generated from the 26.4 million structures.

Analysis

Comparison

Compares GDB to a combined reference database (RDB) of organic molecules from PubChem, ChemACX, ChemSCX, the NCI Open Database, and the Merck Index.

New Rings

All acyclic graphs from GDB (309) represented in prior databases. Only 670 of 1208 ring systems (55.5%) represented in other databases. 367 of the 538 previously unknown ring systems (68.2%) are chiral.

Stereochemistry

Small molecules with less than 5 heavy atoms were mostly achiral. Over two thirds of molecules with 10 or 11 atoms were chiral.

Physiochemical Properties

100% of GDB obeys Lipinski’s ‘Rule of 5’ for bioavailability. Half of GDB satisfies the more restrictive ‘Rule of 3’ for fragment-based drug design.

Related Datasets

DatasetRelationshipLink
GDB-13Successor📄 View Details
GDB-17Successor📄 View Details