
Coulomb Matrices for Molecular Machine Learning
A practical introduction to Coulomb matrices: how they transform molecular 3D structures into ML features, complete with Python examples and honest assessment of their limitations.

A practical introduction to Coulomb matrices: how they transform molecular 3D structures into ML features, complete with Python examples and honest assessment of their limitations.

Only 2% of congressional bills become law. We analyze 15K bills from 2021-2023 to understand what drives legislative success and failure.

Learn to align molecular structures and point clouds using the Kabsch algorithm, with differentiable implementations for modern ML frameworks.

Step-by-step LAMMPS tutorial for simulating copper and platinum adatom diffusion. Learn surface dynamics simulation, trajectory analysis, and how atomic mass affects diffusion for machine learning datasets.

An input-to-analysis workflow for simulating adatom diffusion on FCC metal surfaces using LAMMPS and EAM potentials, covering copper and platinum to compare how atomic mass and bonding strength affect surface dynamics, with a Python analysis layer that generates energy and trajectory diagnostic plots. The LAMMPS setup is adapted from Eric N. Hahn’s adatom tutorial.

A practical guide to simulating mini-proteins using GROMACS; from alanine dipeptide to tryptophan systems for ML training data generation.

An automated GROMACS pipeline for generating molecular dynamics datasets suitable for machine learning, simulating capped dipeptides across nine residue types with 0.1 ps force-output resolution and atomic force extraction for training Neural Network Potentials.

A computational social science project that built a 47,000+ bill dataset from Congress.gov (115th-117th Congresses), with a co-sponsorship legislative graph and TF-IDF baseline models for 33-class policy-area classification (up to ~0.89 weighted F1 on full text), now available on Hugging Face.

A PyTorch implementation enforcing strict Lyapunov stability guarantees on recurrent neural network controllers through Integral Quadratic Constraints, bridging 1990s robust control theory with modern deep reinforcement learning by solving semidefinite programs inside the gradient descent loop to provide mathematical certificates of safety.

We provide the first known analytical solution to Word2Vec’s softmax skip-gram objective, introducing the Independent Frequencies Model and deriving a low-cost, training-free method for measuring semantic bias directly from corpus statistics.

We develop EigenNoise, a zero-data initialization method for word vectors that synthesizes representations from Zipf’s Law alone, demonstrating competitive performance to GloVe after fine-tuning without requiring any pre-training corpus.

Bachelor’s thesis introducing PyConversations, an open-source library that normalizes over 308 million posts from Twitter, Reddit, Facebook, and 4chan into a unified data model for cross-platform social media research.