Computational Chemistry
ZINC-22 Tranche Browser showing molecular count distribution

ZINC-22: Multi-Billion Molecule Database

ZINC-22 dataset card covering 37+ billion make-on-demand molecules for virtual screening and drug discovery.

Computational Chemistry
SELFIES strings guarantee 100% valid molecules - even when generated randomly

Converting SELFIES Strings to 2D Molecular Images

Visualize SELFIES molecular representations and test their 100% robustness through random sampling experiments.

Computational Chemistry
Aspirin molecular structure generated from SMILES string

Converting SMILES Strings to 2D Molecular Images

Learn how to create 2D molecular images from SMILES strings using RDKit and PIL, with proper formatting and legends.

Computational Chemistry
SELFIES representation of 2-Fluoroethenimine molecule

SELFIES (Self-Referencing Embedded Strings)

SELFIES is a 100% robust molecular string representation for ML, implemented in the open-source selfies Python library....

Computational Chemistry
MARCEL dataset Kraken ligand example in 3D conformation

MARCEL: Molecular Representation and Conformer Ensemble Learning

MARCEL dataset provides 722K+ conformers across 76K+ molecules for drug discovery, catalysis, and molecular …

Computational Chemistry
Methoxybenzonitrile

SMILES (Simplified Molecular Input Line Entry System)

SMILES is a specification for describing the structure of chemical molecules using short ASCII strings....

Computational Chemistry
Log-scale plot showing exponential growth of alkane isomer counts from C1 to C40

The Number of Isomeric Hydrocarbons of the Methane Series

Henze and Blair's 1931 JACS paper deriving exact recursive formulas for counting alkane isomers up to C₄₀.

Computational Chemistry
GEOM dataset example molecule: N-(4-pyrimidin-2-yloxyphenyl)acetamide

GEOM: Energy-Annotated Molecular Conformations

A dataset card for the GEOM dataset, a collection of energy-annotated molecular conformations for property prediction …

Computational Chemistry
GDB-11 molecule structure showing FC1C2OC1c3c(F)coc23

GDB-11: Chemical Universe Database (26.4M Molecules)

A dataset card for the Generated Database 11 (GDB-11), a database of 26.4 million small organic molecules for virtual …

Computational Chemistry

Efficient and Scalable Density Functional Theory Hamiltonian Prediction through Adaptive Sparsity

Luo et al. introduce SPHNet, using adaptive sparsity to dramatically improve SE(3)-equivariant Hamiltonian prediction …...

Computational Chemistry

Processes of Adsorption and Diffusion on Solid Surfaces

Lennard-Jones's 1932 foundational paper introducing potential energy surface models to unify physical and chemical …...

Computational Chemistry
GDB-13 molecule structure showing CCCC(O)(CO)CC1CC1CN

GDB-13: Chemical Universe Database (970M Molecules)

A dataset card for the Generated Database 13 (GDB-13), a database of nearly 1 billion small organic molecules for …