
MolParser: End-to-End Molecular Structure Recognition
MolParser converts molecular images from scientific documents to machine-readable formats using end-to-end learning with …

MolParser converts molecular images from scientific documents to machine-readable formats using end-to-end learning with …

ZINC-22 dataset provides 37+ billion make-on-demand molecules for virtual screening and modern drug discovery.

MARCEL dataset provides 722K+ conformers across 76K+ molecules for drug discovery, catalysis, and molecular …

Dataset card for GEOM, providing energy-annotated molecular conformations generated via CREST/xTB and refined with DFT …

GDB-11 systematically enumerates 26.4M small organic molecules (up to 11 atoms of C, N, O, F) for virtual screening and …

A dataset card for the Generated Database 13 (GDB-13), a database of nearly 1 billion small organic molecules for …

Dataset card for GDB-17, containing 166 billion small organic molecules representing the largest enumerated chemical …

Learn how GEOM transforms 2D molecular graphs into dynamic 3D conformer ensembles for molecular machine learning …

An end-to-end cheminformatics pipeline transforming 1D chemical formulas into 3D conformer datasets using graph …

Learn how dataset bias can lead to misleading results in NLP: a sarcasm detection model that actually learned to …

What happens to bills in Congress? Analyzing 15K+ bills from the 117th Congress to understand legislative patterns, …

LAMMPS tutorial for copper surface diffusion simulation and ML training data generation. Includes setup, analysis, and …