
MolParser-7M and WildMol Datasets for Robust Chemical Structure Recognition
MolParser-7M is a 7.7M-pair dataset for molecule-to-text conversion, featuring real-world images and complex structures …
MolParser-7M is a 7.7M-pair dataset for molecule-to-text conversion, featuring real-world images and complex structures …
A dataset card for ZINC-22, the largest freely available database of commercially available compounds for virtual …
Learn how to create 2D molecular images from SMILES strings using RDKit and PIL, with proper formatting and legends.
MARCEL dataset provides 722K+ conformers across 76K+ molecules for drug discovery, catalysis, and molecular …
Henze and Blair's 1931 JACS paper introducing the recursive method for counting alkane isomers, founding mathematical …...
A dataset card for the GEOM dataset, a collection of energy-annotated molecular conformations for property prediction …
A dataset card for the Generated Database 11 (GDB-11), a database of 26.4 million small organic molecules for virtual …
A dataset card for the Generated Database 13 (GDB-13), a database of nearly 1 billion small organic molecules for …
Dataset card for GDB-17, containing 166 billion small organic molecules representing the largest enumerated chemical …
Learn how GEOM transforms 2D molecular graphs into dynamic 3D conformer ensembles for molecular machine learning …
Supervised learning reveals hidden eigenvalue patterns that clustering missed, testing k-NN and logistic regression on …
Clustering analysis reveals why Coulomb matrix eigenvalues struggle with larger alkanes, using Dunn Index and silhouette …