Key Contribution
The generation and analysis of the Generated Database (GDB), an exhaustive collection of all possible small molecules that meet specific criteria for stability and synthetic feasibility.
Dataset Details | |
Authors | Tobias Fink, Jean-Louis Reymond |
Paper Title | Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, and F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physiochemical Properties, Compound Classes, and Drug Discovery |
Institution | University of Berne |
Published In | Journal of Chemical Information and Modeling |
Category | Computational Chemistry |
Format | SMILES |
Size | Molecules: 26,434,571 Stereoisomers: 110,979,507 |
Date | August 2025 |
Year | 2007 |
Links | 📊 Dataset • 🔗 DOI • 📄 Paper |
The generation and analysis of the Generated Database (GDB), an exhaustive collection of all possible small molecules that meet specific criteria for stability and synthetic feasibility.
Type | Count |
---|---|
Molecules | 26,434,571 |
Stereoisomers | 110,979,507 |
GENG to generate starting graphs resulting in 843,335 connected graphs with up to 11 nodes. Filtered using topological and steric criteria to 15,726 stable graphs.
Graph symmetry algorithm used to identify valid locations for unsaturations and element types. Combinatorial expansion yielded 1.7 billion unique structures.
Filtering out heteroatom bonds, gem-diols, aminals, enols, orthoacids, acyl fluorides, and other labile functional groups, reduces to 27.7 million structures. Removal of redundant tautomeric forms yield 26.4 million structures.
110.9 million stereoisomers generated from the 26.4 million structures.
Compares GDB to a combined reference database (RDB) of organic molecules from PubChem, ChemACX, ChemSCX, the NCI Open Database, and the Merck Index.
All acyclic graphs from GDB (309) represented in prior databases. Only 670 of 1208 ring systems (55.5%) represented in other databases. 367 of the 538 previously unknown ring systems (68.2%) are chiral.
Small molecules with less than 5 heavy atoms were mostly achiral. Over two thirds of molecules with 10 or 11 atoms were chiral.
100% of GDB obeys Lipinski’s ‘Rule of 5’ for bioavailability. Half of GDB satisfies the more restrictive ‘Rule of 3’ for fragment-based drug design.
Dataset | Relationship | Link |
---|---|---|
GDB-13 | Successor | 📄 View Details |
GDB-17 | Successor | 📄 View Details |