Hunter Heidenreich | ML Research Scientist

Aspirin molecular structure generated from SMILES string

Converting SMILES and SELFIES to 2D Molecular Images

Build a robust Python CLI tool that converts both SMILES and SELFIES notation into publication-quality 2D molecular images, complete with formulas and legends.

Molecular Representations

SELFIES representation of 2-Fluoroethenimine molecule

SELFIES: A Robust Molecular String Representation

SELFIES is a molecular string representation where every possible string decodes to a valid molecule, solving the invalid-output problem that limits SMILES in generative machine learning.

Machine Learning

Sphere packing illustration showing Shannon's geometric interpretation of channel capacity

Communication in the Presence of Noise: Shannon's 1949 Paper

Shannon’s foundational 1949 paper establishing the mathematical framework for modern information theory, defining channel capacity as the fundamental limit for reliable communication over noisy channels and introducing the sampling theorem (Nyquist-Shannon) that underpins all digital signal processing.

Computational Biology

Protein folding energy landscape funnel showing high-energy unfolded states converging to the native state

How to Fold Graciously: Levinthal's Paradox (1969)

Levinthal’s 1969 perspective paper defined the protein folding paradox by demonstrating the impossibility of random search, establishing the need for kinetic pathways that guide folding faster than thermodynamic equilibration allows.

Computational Chemistry

MARCEL dataset Kraken ligand example in 3D conformation

MARCEL: Molecular Conformer Ensemble Learning Benchmark

MARCEL provides a comprehensive benchmark for molecular representation learning with 722K+ conformers across four diverse subsets (Drugs-75K, Kraken, EE, BDE), enabling evaluation of conformer ensemble methods for property prediction in drug discovery and catalysis.

Molecular Representations

SMILES: A Compact Notation for Chemical Structures

Comprehensive overview of SMILES notation for chemical structures, covering syntax for atoms, bonds, branches, rings, and stereochemistry, plus its key limitations for machine learning.

Molecular Simulation

Müller-Brown Potential Energy Surface showing the three minima and two saddle points

The Müller-Brown Potential: A 2D Benchmark Surface

A two-dimensional analytical potential energy surface introduced in 1979 for testing optimization algorithms. It features three minima and curved transition pathways that evaluate an algorithm’s ability to navigate non-trivial topologies.

Molecular Representations

Log-scale plot showing exponential growth of alkane isomer counts from C1 to C40

The Number of Isomeric Hydrocarbons of the Methane Series

A foundational 1931 paper that derives exact recursive formulas for counting alkane structural isomers, correcting historical errors and establishing the first systematic enumeration up to C₄₀.

Planetary Science

Magellan radar mosaic of Venus showing the northern hemisphere with volcanic plains, tesserae, and lava flows in orange-brown tones

The Surface of Venus: Stratigraphy and Resurfacing History

Basilevsky and Head’s comprehensive synthesis reveals a planet that undergoes catastrophic global resurfacing events. We explore the “stagnant lid” model, the synchronous stratigraphy, coronae, and the divergence of Venus’s geological history from Earth’s.

Computational Chemistry

GEOM dataset example molecule: N-(4-pyrimidin-2-yloxyphenyl)acetamide

GEOM: Energy-Annotated Molecular Conformations Dataset

GEOM contains 450k+ molecules with 37M+ conformations, featuring energy annotations from semi-empirical (GFN2-xTB) and DFT methods for property prediction and molecular generation research.

Scientific Computing

Comparison of exponential sampling methods showing histograms from both inverse transform and von Neumann methods overlaid with the theoretical exponential distribution

Exponential Random Numbers: Two Classic Algorithms

Explore two fundamental approaches to generating exponentially distributed random numbers: the modern inverse transform method using logarithms and von Neumann’s ingenious 1951 comparison-based algorithm that avoids transcendental functions entirely.

Computational Chemistry

GDB-11 molecule structure showing FC1C2OC1c3c(F)coc23

GDB-11: Chemical Universe Database (26.4M Molecules)

GDB-11 contains 26.4 million systematically generated small organic molecules with up to 11 atoms, establishing the methodology for exploring drug-like chemical space computationally.

Hunter Heidenreich | ML Research Scientist — Page 29