Computational Chemistry
Optical chemical structure recognition example

MolParser: End-to-End Molecular Structure Recognition

MolParser converts molecular images from scientific documents to machine-readable formats using end-to-end learning with …

Computational Chemistry
ZINC-22 Tranche Browser showing molecular count distribution

ZINC-22: Multi-Billion Scale Database

ZINC-22 dataset provides 37+ billion make-on-demand molecules for virtual screening and modern drug discovery.

Computational Chemistry
MARCEL dataset Kraken ligand example in 3D conformation

MARCEL: Molecular Representation & Conformers

MARCEL dataset provides 722K+ conformers across 76K+ molecules for drug discovery, catalysis, and molecular …

Computational Chemistry
GEOM dataset example molecule: N-(4-pyrimidin-2-yloxyphenyl)acetamide

GEOM: Energy-Annotated Molecular Conformations

Dataset card for GEOM, providing energy-annotated molecular conformations generated via CREST/xTB and refined with DFT …

Computational Chemistry
GDB-11 molecule structure showing FC1C2OC1c3c(F)coc23

GDB-11: Chemical Universe Database (26.4M Molecules)

GDB-11 systematically enumerates 26.4M small organic molecules (up to 11 atoms of C, N, O, F) for virtual screening and …

Computational Chemistry
GDB-13 molecule structure showing CCCC(O)(CO)CC1CC1CN

GDB-13: Chemical Universe Database (970M Molecules)

A dataset card for the Generated Database 13 (GDB-13), a database of nearly 1 billion small organic molecules for …

Computational Chemistry
GDB-17 molecule structure showing complex polycyclic architecture

GDB-17: Chemical Universe Database (166B Molecules)

Dataset card for GDB-17, containing 166 billion small organic molecules representing the largest enumerated chemical …

Computational Chemistry
Comparison of 2D molecular graph versus 3D conformer ensemble showing latanoprost molecule in multiple conformations

GEOM Dataset: 3D Molecular Conformer Generation

Learn how GEOM transforms 2D molecular graphs into dynamic 3D conformer ensembles for molecular machine learning …

Computational Chemistry
3D ball-and-stick model of butane molecule representing the structural isomer generation process

Synthetic Isomer Data Generation Pipeline

An end-to-end cheminformatics pipeline transforming 1D chemical formulas into 3D conformer datasets using graph …

Natural Language Processing
Word vector illustration showing text classification and NLP concepts

Sarcasm Detection with Transformers: A Cautionary Tale

Learn how dataset bias can lead to misleading results in NLP: a sarcasm detection model that actually learned to …

Computational Social Science
Top features for Economics and Public Finance policy classification across Congresses

How Does Congress Actually Work? Data from 15K Bills

What happens to bills in Congress? Analyzing 15K+ bills from the 117th Congress to understand legislative patterns, …

Computational Chemistry
Ball model representation of a crystal surface with steps, kinks, adatoms, and vacancies showing various surface features

LAMMPS Tutorial: Copper Adatom Diffusion Simulation

LAMMPS tutorial for copper surface diffusion simulation and ML training data generation. Includes setup, analysis, and …