Hunter Heidenreich | Senior AI Research Scientist

Müller-Brown Potential Energy Surface showing the three minima and two saddle points

Müller-Brown Potential: A PyTorch ML Testbed

A PyTorch testbed for the Müller-Brown potential energy surface, built as ground truth for ML-in-molecular-dynamics work. It pairs analytical and autograd force kernels (the analytical path compiled with torch.compile) with a BAOAB Langevin sampler validated against the canonical distribution and a two-tier test suite.

Scientific Computing

Velocity Autocorrelation Function showing the signature negative region characteristic of liquid dynamics and the cage effect discovered by Rahman

Modernizing Rahman''s 1964 Argon Simulation

A digital restoration of Rahman’s seminal 1964 molecular dynamics paper using LAMMPS and a Python analysis pipeline featuring decorator-based caching, fully vectorized NumPy computations for O(N^2) operations, and modern tooling (uv, type hints, Makefile automation) transforming academic scripts into a reproducible research toolkit.

Scientific Computing

Velocity Autocorrelation Function showing the signature negative region characteristic of liquid dynamics

Modernizing Rahman's 1964 Argon Simulation

I replicated Rahman’s landmark 1964 liquid argon molecular dynamics simulation using modern tools, building a Python analysis pipeline with caching, vectorization, and type hints to bridge vintage science with modern software engineering.

Natural Language Processing

Huffman Tree visualization for the input 'beep boop beer!' showing internal nodes with frequency counts and leaf nodes with characters

Vectorized Word2Vec in Pure PyTorch

A ground-up PyTorch implementation of Word2Vec treating it as a systems engineering challenge, with “tensorized tree” architecture converting pointer-chasing Hierarchical Softmax into dense GPU operations, infinite streaming datasets with Zipfian subsampling, and torch.compile compatibility.

Computational Chemistry

3D conformer ensemble of a drug-like molecule from the GEOM dataset

GEOM Dataset: 3D Molecular Conformer Generation

Get a practical overview of the GEOM dataset and learn how it’s advancing 3D molecular machine learning by bridging static graphs and dynamic reality.

Document Processing

Stream accuracy versus relative throughput for Mistral-7B and XGBoost models

LLMs for Insurance Document Automation

We explore LLM applications for page stream segmentation in insurance document processing, demonstrating that parameter-efficient fine-tuning achieves strong accuracy but revealing significant calibration challenges that limit deployment confidence.

Document Processing

Diagram showing page stream segmentation workflow: an input stream of pages is processed through binary classification of page pairs to predict document breaks, producing segmented output documents

LLMs for Page Stream Segmentation

We create TabMe++, an enhanced page stream segmentation benchmark with commercial-grade OCR, and show that parameter-efficiently fine-tuned decoder-based LLMs like Mistral-7B achieve 80% straight-through processing rates, outperforming encoder-based models.

Molecular Generation

3D ball-and-stick model of butane molecule representing the structural isomer generation process

Synthetic Isomer Data Generation Pipeline

An end-to-end data factory for molecular machine learning that transforms raw chemical formulas (e.g., C6H14) into labeled 3D conformer datasets, using MAYGEN for structural isomer enumeration, RDKit for 3D embedding, and physics-based featurization to address data scarcity in computational drug discovery.

Generative Modeling

Variational Autoencoder architecture diagram showing encoder, latent space, and decoder

Modern PyTorch VAEs: A Detailed Implementation Guide

A complete guide to implementing modern Variational Autoencoders in PyTorch. Includes a copy-pasteable implementation, explanation of KL annealing to fix posterior collapse, and a deep dive into stable standard deviation parameterizations.

Natural Language Processing

Word vector illustration showing text classification and NLP concepts

Sarcasm Detection with Transformers: A Cautionary Tale

What happens when you achieve 99.8% accuracy on sarcasm detection? You might have accidentally built a domain classifier. A cautionary ML tale about dataset bias.

Molecular Representations

3D ball-and-stick model of butane molecule showing linear carbon chain structure

Hearing Molecular Shape via Coulomb Matrix Eigenvalues

Can mathematical signatures capture molecular shape? We test whether Coulomb matrix eigenvalues can distinguish alkane constitutional isomers, from unsupervised clustering failures to supervised learning successes.

Computational Social Science

Top features for Armed Forces and National Security policy classification showing veterans, defense, military keywords

Classifying Congressional Bills with Machine Learning

We test three ML models on 48K congressional bills to see how well they can predict policy areas from bill text. Results show logistic regression performs best, with a certified weighted-F1 of ~0.88 within-Congress (0.877) and ~0.87 out-of-Congress (0.871).

Hunter Heidenreich | Senior AI Research Scientist — Page 2