Optical Chemical Structure Recognition
Markush structure diagram

SubGrapher: Visual Fingerprinting of Chemical Structures

SubGrapher introduces a visual fingerprinting approach to Optical Chemical Structure Recognition that detects functional groups directly from images, enabling chemical database searches without full structure reconstruction and handling complex patent images including Markush structures.

Optical Chemical Structure Recognition
Diagram showing how Ring-Free Language decouples a molecular graph into skeleton, ring structures, and branch information

RFL: Simplifying Chemical Structure Recognition (AAAI 2025)

Proposes Ring-Free Language (RFL) to hierarchically decouple molecular graphs into skeletons, rings, and branches, solving issues with 1D serialization of complex 2D structures. Introduces the Molecular Skeleton Decoder (MSD) to progressively predict these components, achieving strong results on handwritten and printed chemical structure recognition benchmarks.

Molecular Generation
3D ball-and-stick model of butane molecule representing the structural isomer generation process

Synthetic Isomer Data Generation Pipeline

An end-to-end data factory for molecular machine learning that transforms raw chemical formulas (e.g., C6H14) into labeled 3D conformer datasets, using MAYGEN for structural isomer enumeration, RDKit for 3D embedding, and physics-based featurization to address data scarcity in computational drug discovery.

Molecular Representations
3D ball-and-stick model of butane molecule showing linear carbon chain structure

Hearing Molecular Shape via Coulomb Matrix Eigenvalues

Can mathematical signatures capture molecular shape? We test whether Coulomb matrix eigenvalues can distinguish alkane constitutional isomers, from unsupervised clustering failures to supervised learning successes.

Molecular Representations
Coulomb matrix heatmap visualization showing molecular structure encoding on logarithmic scale

Coulomb Matrices for Molecular Machine Learning

A practical introduction to Coulomb matrices: how they transform molecular 3D structures into ML features, complete with Python examples and honest assessment of their limitations.