
Sarcasm Detection with Transformers: A Cautionary Tale
What happens when you achieve 99.8% accuracy on sarcasm detection? You might have accidentally built a domain classifier. A cautionary ML tale about dataset bias.

What happens when you achieve 99.8% accuracy on sarcasm detection? You might have accidentally built a domain classifier. A cautionary ML tale about dataset bias.

Can mathematical signatures capture molecular shape? We test whether Coulomb matrix eigenvalues can distinguish alkane constitutional isomers, from unsupervised clustering failures to supervised learning successes.

We test three ML models on 48K congressional bills to see how well they can predict policy areas from bill text. Results show logistic regression achieves 89% F1 score.

A practical introduction to Coulomb matrices: how they transform molecular 3D structures into ML features, complete with Python examples and honest assessment of their limitations.

Only 2% of congressional bills become law. We analyze 15K bills from 2021-2023 to understand what drives legislative success and failure.

Learn to align molecular structures and point clouds using the Kabsch algorithm, with differentiable implementations for modern ML frameworks.

Step-by-step LAMMPS tutorial for simulating copper and platinum adatom diffusion. Learn surface dynamics simulation, trajectory analysis, and how atomic mass affects diffusion for machine learning datasets.

A complete input-to-analysis workflow for simulating adatom diffusion on FCC metal surfaces using LAMMPS and EAM potentials, providing comparative datasets for copper and platinum that demonstrate how atomic mass and bonding strength affect surface dynamics, with automated Python analysis generating publication-ready visualizations.

A practical guide to simulating mini-proteins using GROMACS; from alanine dipeptide to tryptophan systems for ML training data generation.

An automated GROMACS pipeline for generating high-fidelity molecular dynamics datasets suitable for machine learning, simulating capped dipeptides across nine residue types with 0.1 ps resolution and atomic force extraction optimized for training Neural Network Potentials.

A computational social science project that engineered a custom extraction engine to build a 47,000+ bill knowledge graph from Congress.gov (115th-117th Congresses), creating a novel legislative graph with co-sponsorship networks and establishing an 87% accuracy benchmark for policy area classification now available on Hugging Face.

A PyTorch implementation enforcing strict Lyapunov stability guarantees on recurrent neural network controllers through Integral Quadratic Constraints, bridging 1990s robust control theory with modern deep reinforcement learning by solving semidefinite programs inside the gradient descent loop to provide mathematical certificates of safety.