Computational Chemistry
Aspirin molecular structure generated from SMILES string

Converting SMILES and SELFIES to 2D Molecular Images

Build a robust Python CLI tool that converts both SMILES and SELFIES notation into publication-quality 2D molecular images, complete with formulas and legends.

Computational Chemistry
SELFIES representation of 2-Fluoroethenimine molecule

SELFIES (Self-Referencing Embedded Strings)

An in-depth overview of SELFIES, the 100% robust molecular string representation designed to overcome SMILES limitations in machine learning, where every possible string (even random ones) decodes to a valid molecule through local operations, customizable valence rules, and graph-based internal representations.

Computational Chemistry
MARCEL dataset Kraken ligand example in 3D conformation

MARCEL: Molecular Representation & Conformers

MARCEL provides a comprehensive benchmark for molecular representation learning with 722K+ conformers across four diverse subsets (Drugs-75K, Kraken, EE, BDE), enabling evaluation of conformer ensemble methods for property prediction in drug discovery and catalysis.

Computational Chemistry
Benzene molecule with SMILES notation

SMILES: Compact Notation for Chemical Structures

Comprehensive overview of SMILES notation for chemical structures, covering syntax for atoms, bonds, branches, rings, and stereochemistry, plus its key limitations for machine learning.

Computational Chemistry
GEOM dataset example molecule: N-(4-pyrimidin-2-yloxyphenyl)acetamide

GEOM: Energy-Annotated Molecular Conformations

GEOM contains 450k+ molecules with 37M+ conformations, featuring energy annotations from semi-empirical (GFN2-xTB) and DFT methods for property prediction and molecular generation research.

Computational Chemistry
Potential energy surface showing molecular conformation space with equilibrium and low energy conformations

DenoiseVAE: Adaptive Noise for Molecular Pre-training

ICLR 2025 paper introducing DenoiseVAE, which learns adaptive, atom-specific noise distributions through a VAE framework to improve denoising-based pre-training for molecular force field prediction, outperforming fixed Gaussian noise approaches on quantum chemistry benchmarks.

Computational Chemistry
Adaptive grid merging visualization for benzene molecule showing multi-resolution spatial discretization

Beyond Atoms: 3D Space Modeling for Molecular Pretraining

ICML 2025 paper introducing SpaceFormer, a Transformer architecture that challenges the atom-centric paradigm by modeling the continuous 3D space surrounding molecules using adaptive multi-resolution grids, achieving state-of-the-art performance on quantum property prediction benchmarks.

Computational Chemistry
Comparison of 2D molecular graph versus 3D conformer ensemble showing latanoprost molecule in multiple conformations

GEOM Dataset: 3D Molecular Conformer Generation

Get a practical overview of the GEOM dataset and learn how it’s advancing 3D molecular machine learning by bridging static graphs and dynamic reality.

Computational Chemistry
SELFIES robustness demonstration

Invalid SMILES Benefit Chemical Language Models: A Study

A provocative 2024 Nature Machine Intelligence paper challenging the assumption that invalid SMILES are failures, showing empirically that the ability to generate invalid outputs actually improves chemical language model performance by enabling quality filtering and providing richer training signals.

Computational Chemistry
3D ball-and-stick model of butane molecule representing the structural isomer generation process

Synthetic Isomer Data Generation Pipeline

An end-to-end data factory for molecular machine learning that transforms raw chemical formulas (e.g., C6H14) into labeled 3D conformer datasets, using MAYGEN for structural isomer enumeration, RDKit for 3D embedding, and physics-based featurization to address data scarcity in computational drug discovery.

Computational Chemistry
3D ball-and-stick model of butane molecule showing linear carbon chain structure

Hearing Molecular Shape via Coulomb Matrix Eigenvalues

Can mathematical signatures capture molecular shape? We test whether Coulomb matrix eigenvalues can distinguish alkane constitutional isomers, from unsupervised clustering failures to supervised learning successes.

Computational Chemistry
Coulomb matrix heatmap visualization showing molecular structure encoding on logarithmic scale

Coulomb Matrices for Molecular Machine Learning

A practical introduction to Coulomb matrices: how they transform molecular 3D structures into ML features, complete with Python examples and honest assessment of their limitations.