Computational Chemistry
Optical chemical structure recognition example

MolParser: End-to-End Molecular Structure Recognition

MolParser converts molecular images from scientific documents to machine-readable formats using end-to-end learning with …

Computational Chemistry
ZINC-22 Tranche Browser showing molecular count distribution

ZINC-22: Multi-Billion Scale Database

ZINC-22 dataset provides 37+ billion make-on-demand molecules for virtual screening and modern drug discovery.

Computational Chemistry
SELFIES strings guarantee 100% valid molecules - even when generated randomly

Converting SELFIES Strings to 2D Molecular Images

Visualize SELFIES molecular representations and test their 100% robustness through random sampling experiments.

Computational Chemistry
Aspirin molecular structure generated from SMILES string

Converting SMILES Strings to 2D Molecular Images

Learn how to create 2D molecular images from SMILES strings using RDKit and PIL, with proper formatting and legends.

Computational Chemistry
SELFIES representation of 2-Fluoroethenimine molecule

SELFIES (Self-Referencing Embedded Strings)

SELFIES is a 100% robust molecular string representation for ML, implemented in the open-source selfies Python library.

Computational Chemistry
MARCEL dataset Kraken ligand example in 3D conformation

MARCEL: Molecular Representation & Conformers

MARCEL dataset provides 722K+ conformers across 76K+ molecules for drug discovery, catalysis, and molecular …

Computational Chemistry
Benzene molecule with SMILES notation

SMILES: Compact Notation for Chemical Structures

SMILES (Simplified Molecular Input Line Entry System) represents chemical structures using compact ASCII strings.

Computational Chemistry
Log-scale plot showing exponential growth of alkane isomer counts from C1 to C40

The Number of Isomeric Hydrocarbons of the Methane Series

Henze and Blair's 1931 JACS paper deriving exact recursive formulas for counting constitutional alkane isomers.

Computational Chemistry
GEOM dataset example molecule: N-(4-pyrimidin-2-yloxyphenyl)acetamide

GEOM: Energy-Annotated Molecular Conformations

Dataset card for GEOM, providing energy-annotated molecular conformations generated via CREST/xTB and refined with DFT …

Computational Chemistry
GDB-11 molecule structure showing FC1C2OC1c3c(F)coc23

GDB-11: Chemical Universe Database (26.4M Molecules)

GDB-11 systematically enumerates 26.4M small organic molecules (up to 11 atoms of C, N, O, F) for virtual screening and …

Computational Chemistry
Spherical harmonics visualization

Efficient DFT Hamiltonian Prediction via Adaptive Sparsity

Luo et al. introduce SPHNet, using adaptive sparsity to dramatically improve SE(3)-equivariant Hamiltonian prediction …

Computational Chemistry
Schematic showing atom-surface interaction using the method of images

Adsorption and Diffusion on Surfaces

Lennard-Jones's 1932 foundational paper introducing potential energy surface models to unify physical and chemical …