Dataset cards covering large-scale molecular enumeration databases (GDB-11/13/17, ZINC-22) for virtual screening and drug discovery, and conformer ensemble datasets (GEOM, MARCEL) for molecular property prediction and 3D modeling.

YearDatasetKey Idea
2007GDB-11: Chemical Universe Database (26.4M Molecules)Systematic enumeration of 26.4M small organic molecules up to 11 heavy atoms
2009GDB-13: Chemical Universe Database (970M Molecules)Extension to 970M molecules up to 13 heavy atoms
2012GDB-17: Chemical Universe Database (166.4B Molecules)Largest enumeration database with 166.4B molecules up to 17 heavy atoms
2022GEOM: Energy-Annotated Molecular Conformations DatasetEnergy-annotated molecular conformer ensembles for 3D modeling
2023ZINC-22: A Multi-Billion Scale Database for Ligand DiscoveryOver 37B make-on-demand molecules for virtual screening
2024MARCEL: Molecular Conformer Ensemble Learning Benchmark722K+ conformers across 76K+ molecules for conformer ensemble learning