Computational Chemistry

InChI and Tautomerism: Toward a Comprehensive Treatment

Dhaked et al.'s comprehensive analysis of tautomerism in chemoinformatics, introducing 86 new tautomeric rules and their …...

Computational Chemistry

Making InChI FAIR and Sustainable for Inorganic Chemistry

The InChI v1.07 release modernizes chemical identifiers for FAIR data principles, fixes thousands of bugs, and proposes …...

Computational Chemistry
The transformation from a 2D chemical structure image to a SMILES representation

What is Optical Chemical Structure Recognition (OCSR)?

A micro-review of Optical Chemical Structure Recognition (OCSR), tracing its evolution from rule-based systems to …

Document Processing
A colored molecule with annotations, representing the diverse drawing styles found in scientific papers that OCSR models must handle.

MolParser-7M and WildMol Datasets for Robust Chemical Structure Recognition

MolParser-7M is a 7.7M-pair dataset for molecule-to-text conversion, featuring real-world images and complex structures …

Computational Chemistry
ZINC-22 Tranche Browser showing molecular count distribution

ZINC-22: Multi-Billion Molecule Database

A dataset card for ZINC-22, the largest freely available database of commercially available compounds for virtual …

Computational Chemistry
Aspirin molecular structure generated from SMILES string

Converting SMILES Strings to 2D Molecular Images

Learn how to create 2D molecular images from SMILES strings using RDKit and PIL, with proper formatting and legends.

Computational Chemistry
MARCEL dataset Kraken ligand example in 3D conformation

MARCEL: Molecular Representation and Conformer Ensemble Learning

MARCEL dataset provides 722K+ conformers across 76K+ molecules for drug discovery, catalysis, and molecular …

Computational Chemistry

The Number of Isomeric Hydrocarbons of the Methane Series

Henze and Blair's 1931 JACS paper introducing the recursive method for counting alkane isomers, founding mathematical …...

Computational Chemistry
GEOM dataset example molecule: N-(4-pyrimidin-2-yloxyphenyl)acetamide

GEOM: Energy-Annotated Molecular Conformations

A dataset card for the GEOM dataset, a collection of energy-annotated molecular conformations for property prediction …

Computational Chemistry
GDB-11 molecule structure showing FC1C2OC1c3c(F)coc23

GDB-11: Chemical Universe Database (26.4M Molecules)

A dataset card for the Generated Database 11 (GDB-11), a database of 26.4 million small organic molecules for virtual …

Computational Chemistry
GDB-13 molecule structure showing CCCC(O)(CO)CC1CC1CN

GDB-13: Chemical Universe Database (970M Molecules)

A dataset card for the Generated Database 13 (GDB-13), a database of nearly 1 billion small organic molecules for …

Computational Chemistry
GDB-17 molecule structure showing complex polycyclic architecture

GDB-17: Chemical Universe Database (166B Molecules)

Dataset card for GDB-17, containing 166 billion small organic molecules representing the largest enumerated chemical …