<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Projects on Hunter Heidenreich | Senior AI Research Scientist</title><link>https://hunterheidenreich.com/projects/</link><description>Recent content in Projects on Hunter Heidenreich | Senior AI Research Scientist</description><image><title>Hunter Heidenreich | Senior AI Research Scientist</title><url>https://hunterheidenreich.com/img/avatar.webp</url><link>https://hunterheidenreich.com/img/avatar.webp</link></image><generator>Hugo -- 0.147.7</generator><language>en-US</language><copyright>2026 Hunter Heidenreich</copyright><lastBuildDate>Sat, 30 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://hunterheidenreich.com/projects/index.xml" rel="self" type="application/rss+xml"/><item><title>Kabsch-Horn Cookbook: Differentiable Alignment</title><link>https://hunterheidenreich.com/projects/kabsch-horn-cookbook/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/kabsch-horn-cookbook/</guid><description>Differentiable Kabsch (SVD) and Horn (quaternion) alignment for NumPy, PyTorch, JAX, TensorFlow, and MLX with gradient-safe SVD.</description><content:encoded><![CDATA[<h2 id="overview">Overview</h2>
<p>Aligning two sets of corresponding points, finding the optimal rotation (and optionally translation and scale) that maps one onto the other, is a fundamental operation across scientific computing. It appears in molecular dynamics (superimposing protein conformations), robotics (sensor registration), and computer vision (shape matching). The two dominant algorithm families are the Kabsch (SVD-based) method and the Horn (quaternion-based) method.</p>
<p>The <strong>Kabsch-Horn Cookbook</strong> is a Python library that implements both algorithm families across five numerical frameworks: NumPy, PyTorch, JAX, TensorFlow, and MLX. Every backend shares the same API, supports N-dimensional point sets, per-point weights, and arbitrary batch dimensions. The PyTorch, JAX, TensorFlow, and MLX backends are fully differentiable, with custom autograd rules that bypass the numerically unstable gradient of the standard SVD near degenerate singular values.</p>
<h2 id="features">Features</h2>
<h3 id="algorithms">Algorithms</h3>
<ul>
<li><strong>Kabsch</strong>: SVD-based optimal rotation for rigid alignment</li>
<li><strong>Kabsch-Umeyama</strong>: Kabsch with an additional optimal scaling factor $c$, solving $Q \approx cRP + t$</li>
<li><strong>Horn</strong>: Quaternion-based optimal rotation via the eigendecomposition of a $4 \times 4$ key matrix</li>
<li><strong>Horn + Scale</strong>: Horn&rsquo;s method extended with optimal isotropic scaling</li>
<li><strong>RMSD Wrappers</strong>: Convenience functions that return RMSD directly alongside the alignment parameters</li>
</ul>
<h3 id="framework-support">Framework Support</h3>
<table>
  <thead>
      <tr>
          <th>Framework</th>
          <th style="text-align: center">Differentiable</th>
          <th style="text-align: center">Compile/JIT</th>
          <th>Versions</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>NumPy</td>
          <td style="text-align: center"></td>
          <td style="text-align: center"></td>
          <td>1.24+</td>
      </tr>
      <tr>
          <td>PyTorch</td>
          <td style="text-align: center">Yes</td>
          <td style="text-align: center"><code>torch.compile</code></td>
          <td>2.0+</td>
      </tr>
      <tr>
          <td>JAX</td>
          <td style="text-align: center">Yes</td>
          <td style="text-align: center"><code>jax.jit</code></td>
          <td>0.4+</td>
      </tr>
      <tr>
          <td>TensorFlow</td>
          <td style="text-align: center">Yes</td>
          <td style="text-align: center"></td>
          <td>2.13+</td>
      </tr>
      <tr>
          <td>MLX</td>
          <td style="text-align: center">Yes</td>
          <td style="text-align: center"></td>
          <td>0.1+</td>
      </tr>
  </tbody>
</table>
<p><code>torch.compile</code> and <code>jax.jit</code> are the tested compile/JIT paths. MLX supports 3D inputs only; the Kabsch (SVD) path is N-dimensional on the other four backends.</p>
<h3 id="numerical-robustness">Numerical Robustness</h3>
<p>Standard SVD and eigendecomposition backward passes produce <code>NaN</code> gradients when singular values collide or are near-zero. The library provides custom autograd primitives to handle these cases:</p>
<ul>
<li><strong>SafeSVD</strong> (PyTorch, JAX, TF, MLX): Custom backward pass that clamps the singular value gap, preventing division-by-zero in the gradient</li>
<li><strong>SafeEigh</strong> (PyTorch, JAX, TF, MLX): Analogous safe backward for the symmetric eigendecomposition used in Horn&rsquo;s method</li>
<li><strong>Per-point weights</strong>: Weighted centroids and weighted cross-covariance for mass-weighted or confidence-weighted alignment</li>
<li><strong>Batch dimensions</strong>: All functions broadcast over leading batch dimensions without explicit loops</li>
<li><strong>Mixed-dtype promotion</strong>: Inputs are promoted to a common floating-point dtype automatically</li>
</ul>
<h3 id="testing">Testing</h3>
<p>The test suite uses Hypothesis-based property testing across 13 modules covering:</p>
<ul>
<li>Round-trip correctness (align then compare)</li>
<li>Gradient finiteness and correctness (finite-difference checks)</li>
<li>Reflection handling (proper vs. improper rotations)</li>
<li>Weighted alignment consistency</li>
<li>Batch broadcasting</li>
<li>4 differentiable backends $\times$ 4 precisions (float32, float64, and where supported, float16, bfloat16)</li>
</ul>
<h2 id="usage">Usage</h2>
<p>This is a reference cookbook, so you can copy the framework folder you need from <code>src/kabsch_horn/&lt;framework&gt;/</code> directly into your project (the code has no runtime dependencies beyond the framework itself). To depend on it instead, install a pinned version from GitHub:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>pip install <span style="color:#e6db74">&#34;git+https://github.com/hunter-heidenreich/Kabsch-Cookbook.git@v0.4.1&#34;</span>
</span></span></code></pre></div><p>Basic alignment with NumPy:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> numpy <span style="color:#66d9ef">as</span> np
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> kabsch_horn <span style="color:#f92672">import</span> numpy <span style="color:#66d9ef">as</span> kh
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Two sets of corresponding 3D points</span>
</span></span><span style="display:flex;"><span>P <span style="color:#f92672">=</span> np<span style="color:#f92672">.</span>random<span style="color:#f92672">.</span>randn(<span style="color:#ae81ff">100</span>, <span style="color:#ae81ff">3</span>)
</span></span><span style="display:flex;"><span>R_true <span style="color:#f92672">=</span> np<span style="color:#f92672">.</span>linalg<span style="color:#f92672">.</span>qr(np<span style="color:#f92672">.</span>random<span style="color:#f92672">.</span>randn(<span style="color:#ae81ff">3</span>, <span style="color:#ae81ff">3</span>))[<span style="color:#ae81ff">0</span>]  <span style="color:#75715e"># random rotation matrix</span>
</span></span><span style="display:flex;"><span>Q <span style="color:#f92672">=</span> (P <span style="color:#f92672">@</span> R_true<span style="color:#f92672">.</span>T) <span style="color:#f92672">+</span> np<span style="color:#f92672">.</span>random<span style="color:#f92672">.</span>randn(<span style="color:#ae81ff">1</span>, <span style="color:#ae81ff">3</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>R, t, rmsd <span style="color:#f92672">=</span> kh<span style="color:#f92672">.</span>kabsch(P, Q)
</span></span><span style="display:flex;"><span>aligned <span style="color:#f92672">=</span> P <span style="color:#f92672">@</span> R<span style="color:#f92672">.</span>T <span style="color:#f92672">+</span> t
</span></span></code></pre></div><p>RMSD loss for training in PyTorch:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> torch
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> kabsch_horn <span style="color:#f92672">import</span> pytorch <span style="color:#66d9ef">as</span> kh
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>pred_coords <span style="color:#f92672">=</span> model(input_features)   <span style="color:#75715e"># (B, N, 3), requires_grad=True</span>
</span></span><span style="display:flex;"><span>target_coords <span style="color:#f92672">=</span> batch[<span style="color:#e6db74">&#34;target&#34;</span>]       <span style="color:#75715e"># (B, N, 3)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>rmsd <span style="color:#f92672">=</span> kh<span style="color:#f92672">.</span>kabsch_rmsd(pred_coords, target_coords)  <span style="color:#75715e"># (B,)</span>
</span></span><span style="display:flex;"><span>loss <span style="color:#f92672">=</span> rmsd<span style="color:#f92672">.</span>mean()
</span></span><span style="display:flex;"><span>loss<span style="color:#f92672">.</span>backward()  <span style="color:#75715e"># safe gradients via SafeSVD</span>
</span></span></code></pre></div><p>For the full API reference and additional examples, see the <a href="https://hunter-heidenreich.github.io/Kabsch-Cookbook/">documentation site</a>.</p>
<h2 id="results">Results</h2>
<h3 id="gradient-stability">Gradient Stability</h3>
<p>The standard SVD backward pass computes terms of the form $\frac{1}{\sigma_i^2 - \sigma_j^2}$, which diverges when two singular values are close. In molecular alignment this happens frequently: planar molecules, symmetric structures, and noisy coordinates can all produce near-degenerate singular values. The SafeSVD primitive floors the magnitude of that denominator at the dtype&rsquo;s machine epsilon (<code>finfo(dtype).eps</code>), producing finite (if slightly biased) gradients in these edge cases. Property-based tests confirm that gradients remain finite across thousands of random rotations, scales, and noise levels for all four differentiable backends.</p>
<h3 id="framework-parity">Framework Parity</h3>
<p>All five backends produce numerically equivalent results (up to floating-point tolerance) on the same inputs. The shared API means switching from NumPy prototyping to PyTorch training requires changing only the import path.</p>
<h2 id="related-work">Related Work</h2>
<p>This project builds on the foundational alignment algorithms described in these papers:</p>
<ul>
<li><a href="/notes/biology/computational-biology/kabsch-algorithm/">Kabsch (1976)</a>: the original SVD-based rotation alignment</li>
<li><a href="/notes/biology/computational-biology/arun-svd-point-fitting/">Arun et al. (1987)</a>: SVD formulation for 3D point set fitting</li>
<li><a href="/notes/biology/computational-biology/horn-absolute-orientation/">Horn (1987)</a>: quaternion-based closed-form absolute orientation</li>
<li><a href="/notes/biology/computational-biology/horn-orthonormal-matrices/">Horn et al. (1988)</a>: orthonormal matrix (polar decomposition) approach</li>
<li><a href="/notes/biology/computational-biology/umeyama-similarity-transformation/">Umeyama (1991)</a>: extension to include optimal scaling</li>
</ul>
<p>For a detailed walkthrough of the Kabsch algorithm with code examples, see the companion blog post: <a href="/posts/kabsch-algorithm/">The Kabsch Algorithm</a>.</p>
]]></content:encoded></item><item><title>Molecular String Renderer: Chemical Visualization Library</title><link>https://hunterheidenreich.com/projects/molecular-string-renderer/</link><pubDate>Sun, 30 Nov 2025 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/molecular-string-renderer/</guid><description>A type-safe Python library for converting chemical strings (SMILES, SELFIES, InChI) into publication-quality molecular images.</description><content:encoded><![CDATA[<h2 id="overview">Overview</h2>
<p>In computational chemistry and AI drug discovery, visualization pipelines are often brittle; breaking on edge cases or failing silently when processing millions of molecules for training data.</p>
<p>I built <code>molecular-string-renderer</code> to treat molecular visualization as a strict software engineering problem. It is a highly configurable wrapper around RDKit that standardizes the conversion of text-based chemical representations (SMILES, <a href="/notes/chemistry/molecular-representations/notations/inchi-2013/">InChI</a>, SELFIES) into raster and vector graphics, degrading gracefully on inputs RDKit cannot vectorize.</p>
<h2 id="features">Features</h2>
<p>This library differentiates itself from standard plotting scripts through strict architectural patterns designed for reliability:</p>
<h3 id="1-strategy-pattern-for-svg-generation">1. Strategy Pattern for SVG Generation</h3>
<p>RDKit&rsquo;s vector rendering can sometimes fail on complex molecular topologies. I implemented a <strong>Hybrid Strategy</strong> so that a single molecule RDKit cannot vectorize does not fail the batch:</p>
<ul>
<li><strong>Vector Strategy</strong>: Attempts to generate a true, scalable vector graphic.</li>
<li><strong>Raster Fallback</strong>: If the vector engine fails, the system automatically renders a high-res PNG and embeds it transparently into the SVG container.</li>
</ul>
<h3 id="2-native-generative-ai-support">2. Native Generative AI Support</h3>
<p>With the rise of Large Language Models in chemistry, <strong>SELFIES</strong> (Self-Referencing Embedded Strings) has become a standard output format. This library handles SELFIES natively, managing the decoding and sanitization lifecycle internally so that ML training loops can simply &ldquo;pass strings and get images.&rdquo;</p>
<h3 id="3-strict-configuration-contracts">3. Strict Configuration Contracts</h3>
<p>The library uses <strong>Pydantic</strong> models (<code>RenderConfig</code>, <code>ParserConfig</code>, <code>OutputConfig</code>) to enforce strict data contracts. This ensures that visualization parameters are validated before any heavy computation begins, preventing runtime errors deep in a batch job.</p>
<h2 id="usage">Usage</h2>
<p>The library provides a simple Python API for rendering single molecules or batches of molecules from various string formats.</p>
<h2 id="results">Results</h2>
<ul>
<li><strong>Type Safety</strong>: The codebase runs with strict <code>mypy</code> settings, ensuring type safety across the entire pipeline.</li>
<li><strong>Grid Auto-Fitting</strong>: Implemented smart layout algorithms that automatically adjust grid dimensions based on the input batch size.</li>
<li><strong>Format Agnostic</strong>: Decouples the <em>parsing</em> logic (SMILES vs. MolBlock vs. SELFIES) from the <em>rendering</em> logic, making it trivial to add support for new proprietary formats.</li>
</ul>
<h2 id="reliability">Reliability</h2>
<p>When rendering large batches of generated molecules, a single hard-to-draw structure should not fail the whole job. The raster fallback and the strict Pydantic and mypy contracts exist so the pipeline degrades gracefully on edge cases rather than crashing or failing silently, the common failure mode of ad hoc RDKit plotting scripts.</p>
<h2 id="related-work">Related Work</h2>
<ul>
<li><a href="/posts/visualizing-smiles-and-selfies-strings/">Visualizing SMILES and SELFIES Strings</a>: walkthrough of the visualization pipeline this library implements</li>
<li><a href="/projects/isomer-dataset-generation/">Isomer Dataset Generation</a>: related project generating molecular datasets using SMILES/SELFIES representations</li>
</ul>
]]></content:encoded></item><item><title>Müller-Brown Potential: A PyTorch ML Testbed</title><link>https://hunterheidenreich.com/projects/muller-brown-pytorch/</link><pubDate>Wed, 27 Aug 2025 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/muller-brown-pytorch/</guid><description>A PyTorch testbed for the Müller-Brown potential: BAOAB Langevin dynamics, torch.compile analytical forces, and a statistical-mechanics validation suite.</description><content:encoded><![CDATA[<h2 id="overview">Overview</h2>
<p>This project implements the classic 2D Müller-Brown potential in PyTorch as a ground-truth testbed for machine-learning-in-molecular-dynamics (ML-MD) work. The potential is a <code>torch.nn.Module</code> that computes forces two ways: a hand-derived analytical gradient (the default, compiled with <code>torch.compile</code>) and <code>torch.autograd.grad</code> (a reference the analytical path is checked against). On an Apple M1 Max, the analytical kernel runs about 4x faster than autograd (3-7x depending on batch size; 100 warm-up iterations, then the median of 5 runs of 1000), because it skips autograd&rsquo;s graph construction inside the force loop.</p>
<p>The energy is deliberately left uncompiled so that second derivatives (the Hessian via autograd) keep working, since <code>torch.compile</code> does not support double-backward; the force, the hot path, is the compiled function.</p>
<h2 id="features">Features</h2>
<ul>
<li><strong>Dual force kernels</strong>: a hand-derived analytical gradient (compiled) for fast simulation, and an autograd mode for differentiation and as the correctness reference the analytical path is tested against.</li>
<li><strong>BAOAB Langevin integrator</strong>: the BAOAB splitting scheme (Leimkuhler &amp; Matthews, 2013), which solves the friction-plus-noise step exactly and samples the canonical distribution accurately (exactly so for a harmonic oscillator).</li>
<li><strong>Device-agnostic</strong>: potential, forces, and simulation are plain PyTorch tensor operations that run on CPU or CUDA; the included benchmark measures CPU.</li>
<li><strong>Modular architecture</strong>: physics (<code>MuellerBrownPotential</code>), numerics (<code>LangevinSimulator</code>), visualization, and HDF5 I/O are separated, with a CLI orchestrating demo, single-run, batch, and plot-regeneration modes.</li>
</ul>
<h2 id="usage">Usage</h2>
<p>The package installs editable with <code>uv sync</code> and imports as a normal package:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> muller_brown <span style="color:#f92672">import</span> MuellerBrownPotential, LangevinSimulator
</span></span></code></pre></div><p>It provides a fast, differentiable Müller-Brown potential and a Langevin sampler for testing ML-MD algorithms against a known-exact surface.</p>
<h2 id="results">Results</h2>
<h3 id="architecture">Architecture</h3>
<ul>
<li><strong>Physics module</strong>: the energy surface is a <code>torch.nn.Module</code> with the potential parameters held as registered buffers, so device and dtype move with the module.</li>
<li><strong>Analytical force kernel</strong>: the analytical Jacobian is implemented directly and compiled with <code>torch.compile(dynamic=True)</code>, bypassing autograd-graph construction during long simulations.</li>
<li><strong>Vectorized execution</strong>: kernel operations are vectorized over particles, so an ensemble runs in roughly the same wall time as a single particle (the per-step cost is dominated by the fixed force call and noise draw).</li>
<li><strong>Device-agnostic</strong>: all operations move to CUDA via native tensor handling; the benchmark and tests run on CPU.</li>
</ul>
<h3 id="performance">Performance</h3>
<p>A force-throughput benchmark (analytical vs autograd) across batch sizes from 2 to roughly 50,000 particles, on an Apple M1 Max:</p>
<ul>
<li>The analytical kernel is about 4x faster than autograd (3-7x across batch sizes).</li>
<li>Per-particle force time drops below 1 microsecond at large batch sizes.</li>
<li>Throughput rises with batch size and saturates for large ensembles.</li>
</ul>
<h3 id="validation">Validation</h3>
<p>The sampler is checked against statistical mechanics, not just run:</p>
<ul>
<li><strong>Deterministic tests</strong>: the documented minima and saddles have the correct Hessian signatures; the analytical force matches <code>torch.autograd.grad</code>; energy is conserved in the frictionless (NVE) limit; <code>float32</code> matches <code>float64</code>; and HDF5 round-trips preserve the data.</li>
<li><strong>Statistical tests</strong>: the sampler reproduces equipartition, the harmonic-oscillator distributions, and the Müller-Brown Boltzmann mean energy against a grid-integrated reference; a separate convergence study confirms the integrator&rsquo;s kinetic-temperature bias vanishes as the timestep squared.</li>
</ul>
<h3 id="molecular-dynamics">Molecular Dynamics</h3>
<p>Langevin simulations on the surface show particle motion within energy basins, thermal fluctuations around the minima, and barrier-crossing transitions between wells, visualized as trajectories on the potential surface.</p>
<h2 id="simulation-videos">Simulation Videos</h2>
<p>These videos demonstrate Langevin dynamics simulations on the Müller-Brown potential surface:</p>
<p><div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube-nocookie.com/embed/woVM90qXUQs?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<strong>A Basin Dynamics</strong>: Particle motion and thermal fluctuations around the A minimum.</p>
<p><div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube-nocookie.com/embed/gdAHme07bGs?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<strong>B Basin Dynamics</strong>: Exploration of the deeper B minimum energy well.</p>
<p><div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube-nocookie.com/embed/dVFe_4KZbps?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<strong>Transition Path</strong>: Particle transitioning between energy basins, demonstrating barrier crossing.</p>
<h2 id="related-work">Related Work</h2>
<p>This implementation is documented in detail in:</p>
<ul>
<li><a href="/posts/muller-brown-in-pytorch/">Implementing the Müller-Brown Potential in PyTorch</a></li>
<li><a href="/videos/muller-brown-basin-ma-simulation/">Basin A Simulation</a></li>
<li><a href="/videos/muller-brown-basin-mb-simulation/">Basin B Simulation</a></li>
<li><a href="/videos/muller-brown-transition-simulation/">Transition Path Simulation</a></li>
</ul>
]]></content:encoded></item><item><title>Modernizing Rahman''s 1964 Argon Simulation</title><link>https://hunterheidenreich.com/projects/rahman-1964-replication/</link><pubDate>Sat, 23 Aug 2025 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/rahman-1964-replication/</guid><description>A high-fidelity replication of foundational molecular dynamics using modern software engineering practices: caching, vectorization, and strict reproducibility.</description><content:encoded><![CDATA[<h2 id="overview">Overview</h2>
<p>This project is a &ldquo;digital restoration&rdquo; of Aneesur Rahman&rsquo;s seminal 1964 paper, <em>Correlations in the Motion of Atoms in Liquid Argon</em>. While the physics of liquid argon is a solved problem, the challenge lies in bridging the gap between 1960s mainframe constraints and 2025 software architecture.</p>
<p>I replicated the simulation using <strong>LAMMPS</strong> and built a <strong>Python analysis pipeline</strong> to process the trajectory data. The project demonstrates how modern tooling (<code>uv</code>, type hinting, vectorized NumPy) can transform academic &ldquo;write-once&rdquo; scripts into a reproducible research toolkit.</p>
<h2 id="features">Features</h2>
<h3 id="the-analysis-pipeline">The Analysis Pipeline</h3>
<p>I architected a modular Python package (<code>argon_sim</code>) designed for performance and maintainability.</p>
<ul>
<li><strong>Intelligent Caching System</strong>: MD analysis is compute-intensive ($O(N^2)$). I implemented a decorator-based caching layer (<code>@cached_computation</code>) that hashes source file modification times and function arguments. This ensures expensive calculations (like RDF or Van Hove correlations) are only re-run when the underlying trajectory or parameters actually change.</li>
<li><strong>Vectorization &amp; Optimization</strong>: To handle the $N^2$ complexity of pair-wise interactions without C++ extensions, I utilized NumPy broadcasting. For example, the Mean Square Displacement (MSD) calculation is fully vectorized, with a fallback &ldquo;chunked&rdquo; implementation to handle memory overflows on smaller machines.</li>
<li><strong>Modern Python Tooling</strong>:
<ul>
<li><strong>Dependency Management</strong>: Used <code>uv</code> for deterministic environment locking (sub-second resolution).</li>
<li><strong>Type Safety</strong>: Fully type-hinted codebase for static analysis compliance.</li>
<li><strong>Automation</strong>: A <code>Makefile</code> abstracts the workflow (simulation → analysis → figure generation) into single commands (e.g., <code>make figure-5</code>).</li>
</ul>
</li>
</ul>
<h3 id="the-simulation-strategy">The Simulation Strategy</h3>
<p>I used LAMMPS for the MD engine but strictly adhered to Rahman&rsquo;s physical parameters while modernizing the stability mechanisms.</p>
<ul>
<li><strong>Integration</strong>: Replaced Rahman&rsquo;s predictor-corrector method with the modern standard <strong>Velocity Verlet</strong> algorithm (2 fs timestep).</li>
<li><strong>Equilibration</strong>: I implemented a 1 ns <strong>NVT equilibration</strong> phase (500,000 steps at the 2 fs timestep) to properly melt the FCC crystal structure before the NVE production run.</li>
<li><strong>Intellectual Honesty</strong>: The <code>in.argon</code> script explicitly documents every deviation from the original methodology (e.g., energy minimization) and the justification for ensuring numerical stability.</li>
</ul>
<h2 id="usage">Usage</h2>
<p>The project uses a <code>Makefile</code> to automate the workflow. Run <code>make all</code> to execute the LAMMPS simulation and generate all analysis figures.</p>
<h2 id="results">Results</h2>
<p>The replication achieved high quantitative agreement with the historical data, validating both the simulation parameters and the custom analysis code.</p>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Property</th>
          <th style="text-align: left">Rahman (1964)</th>
          <th style="text-align: left">This Work</th>
          <th style="text-align: left">Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left">Diffusion Coefficient ($D$)</td>
          <td style="text-align: left">$2.43 \times 10^{-5}$ cm²/s</td>
          <td style="text-align: left">$2.47 \times 10^{-5}$ cm²/s</td>
          <td style="text-align: left">Agreement within 2%</td>
      </tr>
      <tr>
          <td style="text-align: left">RDF First Peak</td>
          <td style="text-align: left">$3.7$ Å</td>
          <td style="text-align: left">$3.82$ Å</td>
          <td style="text-align: left">Slight shift</td>
      </tr>
      <tr>
          <td style="text-align: left">Velocity Dist. Width ($e^{-1/2}$)</td>
          <td style="text-align: left">$1.77$</td>
          <td style="text-align: left">$1.77$</td>
          <td style="text-align: left">Exact match to theoretical Maxwell-Boltzmann</td>
      </tr>
  </tbody>
</table>
<h3 id="visual-replication">Visual Replication</h3>
<p>I used Matplotlib to digitally recreate Rahman&rsquo;s hand-drawn plots, confirming signatures like the <strong>negative region in the Velocity Autocorrelation Function (VACF)</strong>, which provided the first evidence of the &ldquo;cage effect&rdquo; in simple liquids.</p>















<figure class="post-figure center ">
    <img src="/img/rahman-1964-argon-molecular-dynamics/rahman-argon-velocity-autocorrelation.webp"
         alt="Velocity Autocorrelation Function comparison showing the characteristic negative region"
         title="Velocity Autocorrelation Function comparison showing the characteristic negative region"
         
         
         loading="lazy"
         class="post-image">
    
    <figcaption class="post-caption">The VACF&rsquo;s negative region (first evidence of the &lsquo;cage effect&rsquo; in liquids) reproduced 60 years later.</figcaption>
    
</figure>

<h2 id="challenges--learnings">Challenges &amp; Learnings</h2>
<ul>
<li><strong>Unit Hell</strong>: Rahman&rsquo;s paper uses a mix of reduced units and CGS. Mapping these to LAMMPS&rsquo;s <code>real</code> units required a dedicated <code>constants.py</code> module and rigorous unit testing to prevent dimensional errors.</li>
<li><strong>Fourier Transforms</strong>: Calculating the Structure Factor $S(k)$ from $g(r)$ required implementing a manual 3D Fourier transform for spherical symmetry, as standard FFT packages do not account for the radial shell integration implicit in liquid structure analysis.</li>
<li><strong>Code as a Liability</strong>: Early in the project, I realized that re-running analysis scripts was becoming a bottleneck. This drove the decision to build the caching infrastructure, reinforcing the lesson that investing in developer tooling pays off even in small-scale scientific projects.</li>
</ul>
<h2 id="related-work">Related Work</h2>
<p>The full methodology and physics are documented in the companion blog post:</p>
<ul>
<li><a href="/posts/rahman-1964-lammps-liquid-argon/">Replicating Rahman&rsquo;s 1964 Liquid Argon Simulation</a></li>
</ul>
]]></content:encoded></item><item><title>Vectorized Word2Vec in Pure PyTorch</title><link>https://hunterheidenreich.com/projects/modern-word2vec/</link><pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/modern-word2vec/</guid><description>A from-scratch PyTorch Word2Vec implementation with vectorized Hierarchical Softmax, Negative Sampling, and torch.compile support.</description><content:encoded><![CDATA[<h2 id="overview">Overview</h2>
<p>Word2Vec is often treated as a &ldquo;solved problem&rdquo; or a black box inside libraries like Gensim. This project deconstructs the algorithm to treat it as a <strong>systems engineering challenge</strong>.</p>
<p>I built a ground-up, typed, and compiled PyTorch implementation that bridges the gap between the original C code&rsquo;s efficiency and modern GPU acceleration. The core innovation lies in <strong>&ldquo;tensorizing the tree&rdquo;</strong>, converting the pointer-chasing logic of Hierarchical Softmax into dense, vectorized operations compatible with <code>torch.compile</code>.</p>
<h2 id="features">Features</h2>
<h3 id="1-vectorized-hierarchical-softmax">1. Vectorized Hierarchical Softmax</h3>
<p>Classically, Hierarchical Softmax involves traversing a binary Huffman tree. While efficient on a CPU, this approach creates divergent execution paths on GPUs.</p>
<ul>
<li><strong>The Solution:</strong> I implemented a &ldquo;pre-computed path&rdquo; strategy. The tree traversal for every vocabulary word is flattened into fixed-size tensors (<code>word_path_indices</code>, <code>word_codes_tensor</code>) padded to the maximum depth.</li>
<li><strong>The Result:</strong> The forward pass becomes a massive, masked batch dot-product against internal node embeddings, allowing the GPU to crunch the probability tree without branching logic.</li>
</ul>
<h3 id="2-infinite-streaming--sliding-windows">2. Infinite Streaming &amp; Sliding Windows</h3>
<p>To handle datasets larger than RAM (e.g., Wikipedia/CommonCrawl), I built a custom <code>IterableDataset</code> that performs a true single-pass read.</p>
<ul>
<li><strong>Efficient Windowing:</strong> It uses a <code>collections.deque</code> buffer to slide over the token stream, generating training pairs only when a new token enters the center context.</li>
<li><strong>Zipfian Subsampling:</strong> Implemented a probabilistic rejection sampling layer that downsamples frequent words (like &ldquo;the&rdquo; or &ldquo;of&rdquo;) on-the-fly, strictly adhering to the original Mikolov et al. paper&rsquo;s distribution.</li>
</ul>
<h3 id="3-modern-tooling">3. Modern Tooling</h3>
<p>This project uses a strict &ldquo;software 2.0&rdquo; stack:</p>
<ul>
<li><strong>Dependency Management</strong>: Built with <code>uv</code> for deterministic, fast environment resolution.</li>
<li><strong>Compilation</strong>: Fully compatible with <code>torch.compile</code> (PyTorch 2.0+), allowing for graph fusion of the custom loss functions.</li>
</ul>
<h2 id="usage">Usage</h2>
<p>The library installs from source (clone the repo, then <code>pip install -e .</code>) and exposes a typed Python API (<code>SkipGramModel</code>, <code>CBOWModel</code>, <code>Trainer</code>, <code>Word2VecDataset</code>) alongside <code>word2vec-train</code> and <code>word2vec-query</code> CLIs, with GPU acceleration. Trained embeddings export to <code>.npy</code> for use with Gensim or other tooling.</p>
<h2 id="results">Results</h2>
<ul>
<li><strong>Correct embeddings</strong>: the produced vectors pass qualitative semantic-similarity checks (e.g., analogical reasoning), confirming the tensorized tree produces the same geometry as sequential traversal.</li>
<li><strong>Branch-free GPU execution</strong>: the batched Huffman-tree path turns hierarchical-softmax tree traversal into dense, masked tensor operations, removing the divergent branching that slows naive implementations on GPUs.</li>
<li><strong>Runs on larger-than-RAM corpora</strong>: the streaming <code>IterableDataset</code> with Zipfian subsampling processes Wikipedia/CommonCrawl-scale text in a single pass without loading the corpus into memory.</li>
<li><strong><code>torch.compile</code>-compatible</strong>: the custom loss functions are written to fuse under <code>torch.compile</code> (PyTorch 2.0+).</li>
</ul>
<h2 id="related-work">Related Work</h2>
<p>This project connects to related NLP work on this site:</p>
<ul>
<li><a href="/posts/intro-to-word-embeddings/">An Introduction to Word Embeddings</a>: conceptual background on the representations this library produces</li>
<li><a href="/research/word-company-vicinity/">Word Company Vicinity</a>: research applying word vector semantics to company names</li>
<li><a href="/research/semantic-network-induction/">Semantic Network Induction</a>: research on inducing semantic graphs from embedding spaces</li>
</ul>
]]></content:encoded></item><item><title>Synthetic Isomer Data Generation Pipeline</title><link>https://hunterheidenreich.com/projects/isomer-dataset-generation/</link><pubDate>Sat, 09 Mar 2024 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/isomer-dataset-generation/</guid><description>An end-to-end cheminformatics pipeline transforming 1D chemical formulas into 3D conformer datasets using graph enumeration and physics-based featurization.</description><content:encoded><![CDATA[<h2 id="overview">Overview</h2>
<p>In computational drug discovery, data scarcity is often the bottleneck. This project builds a synthetic data generator that creates labeled 3D molecular datasets starting from nothing but a raw chemical formula (e.g., $C_6H_{14}$).</p>
<p>The pipeline bridges the gap between <strong>1D Chemical Information</strong> (stoichiometry) and <strong>3D Geometric Data</strong> (conformers), effectively serving as a &ldquo;data factory&rdquo; for training molecular machine learning models.</p>
<h2 id="features">Features</h2>
<h3 id="1-graph-enumeration--3d-embedding">1. Graph Enumeration &amp; 3D Embedding</h3>
<p>The core of the project is <code>pysomer/data/gen.py</code>, which orchestrates a multi-step generation process:</p>
<ul>
<li><strong>Structural Isomerism:</strong> Uses <strong>MAYGEN</strong> (via a Java bridge) to mathematically enumerate all valid graph connectivities for a given formula</li>
<li><strong>Conformer Sampling:</strong> Uses <strong>RDKit</strong> to embed these graphs into 3D space, generating multiple conformers (rotamers) per isomer to capture flexibility</li>
<li><strong>IUPAC Labeling:</strong> Automatically queries PubChem APIs to assign human-readable labels (e.g., &ldquo;2-methylpentane&rdquo;) to the generated structures</li>
</ul>
<h3 id="2-physics-aware-featurization">2. Physics-Aware Featurization</h3>
<p>The pipeline computes <strong>Coulomb Matrices</strong>, ensuring the input respects physical invariants:</p>
<p>$$C_{ij} = \begin{cases} 0.5 Z_i^{2.4} &amp; i = j \ \frac{Z_i Z_j}{|R_i - R_j|} &amp; i \neq j \end{cases}$$</p>
<p>This representation encodes the electrostatic potential of the molecule, providing a more informative signal for the neural network than raw Cartesian coordinates.</p>
<h3 id="3-hdf5-data-storage">3. HDF5 Data Storage</h3>
<p>To handle the large volume of generated conformers, the system writes to hierarchical <strong>HDF5</strong> files. This allows for efficient, chunked I/O during training, a critical pattern for scaling to larger chemical spaces.</p>
<h2 id="usage">Usage</h2>
<p>The pipeline is executed via a CLI, taking a chemical formula as input and outputting an HDF5 dataset of 3D conformers.</p>
<h2 id="results">Results</h2>
<p>This project serves as a &ldquo;vertical slice&rdquo; of a cheminformatics workflow.</p>
<ul>
<li><strong>The Good:</strong> The separation of concerns is clean: <code>dataclasses</code> for configuration and HDF5 for storage keep the data-engineering layer tidy and extensible.</li>
<li><strong>The &ldquo;Old School&rdquo;:</strong> The model used is a simple Multi-Layer Perceptron (MLP) on flattened Coulomb Matrices. In a modern production setting (post-2020), I would replace this with an <strong>E(3)-Equivariant GNN</strong> (like SchNet or E3NN) to handle rotational symmetry natively, eliminating manual feature engineering.</li>
<li><strong>Dependency Management:</strong> The reliance on an external Java JAR (<code>MAYGEN</code>) for graph enumeration makes the environment brittle. Today, I would likely swap this for a pure Python enumerator or a containerized microservice to improve portability.</li>
</ul>
<h2 id="related-work">Related Work</h2>
<p>This data pipeline powers the analysis in my comprehensive guide on molecular representation:</p>
<ul>
<li><a href="/posts/alkane-constitutional-isomer-classification/">Coulomb Matrix Eigenvalues: Can You Hear the Shape of a Molecule?</a>: A deep dive into data generation, unsupervised clustering, and supervised classification of alkane isomers.</li>
</ul>
<p>See also:</p>
<ul>
<li><a href="/posts/molecular-descriptor-coulomb-matrix/">The Coulomb Matrix</a>: Deep dive into the physics-based featurization used here</li>
<li><a href="/notes/chemistry/molecular-representations/notations/number-of-isomeric-hydrocarbons/">The Number of Isomeric Hydrocarbons</a>: The foundational 1931 paper on alkane enumeration</li>
</ul>
]]></content:encoded></item><item><title>Automated Adatom Diffusion Workflow</title><link>https://hunterheidenreich.com/projects/lammps-adatom-diffusion/</link><pubDate>Thu, 21 Sep 2023 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/lammps-adatom-diffusion/</guid><description>Python-wrapped reference implementation for surface diffusion simulations using LAMMPS and EAM potentials, with automated analysis pipelines.</description><content:encoded><![CDATA[<h2 id="overview">Overview</h2>
<p>This project provides an &ldquo;input-to-analysis&rdquo; workflow for simulating adatom diffusion on FCC metal surfaces. It demonstrates how to set up surface diffusion simulations in LAMMPS, manage EAM potentials, and parse trajectory data into energy and trajectory plots using Python. The LAMMPS input scripts are adapted from Eric N. Hahn&rsquo;s adatom tutorial; the Python analysis layer (<code>plot_energy.py</code>, <code>plot_xy.py</code>) is my own, written while in CSElab (Harvard, 2023).</p>
<p>The workflow covers two material systems (Copper (Cu) and Platinum (Pt)) providing comparative datasets that highlight how atomic mass and bonding strength affect surface dynamics.</p>
<h2 id="features">Features</h2>
<h3 id="simulation-architecture">Simulation Architecture</h3>
<p>The project separates simulation logic from analysis code:</p>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Directory</th>
          <th style="text-align: left">Description</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left"><strong><code>/adatom_cu</code></strong></td>
          <td style="text-align: left">Copper adatom diffusion on Cu(100)</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong><code>/adatom_pt</code></strong></td>
          <td style="text-align: left">Platinum adatom diffusion on Pt(100)</td>
      </tr>
  </tbody>
</table>
<p>Each directory contains:</p>
<ul>
<li><strong>LAMMPS input scripts</strong> (<code>.in</code> files) defining the physics</li>
<li><strong>EAM potential files</strong> for metallic bonding (the Cu potential is committed; the Pt potential must be downloaded separately from the NIST Interatomic Potentials Repository, so the Pt system does not run as-checked-out)</li>
<li><strong>Python analysis scripts</strong> for trajectory and energy parsing</li>
</ul>
<h3 id="key-features">Key Features</h3>
<ul>
<li><strong>EAM Potentials</strong>: Uses Embedded Atom Method alloy potentials to accurately model metallic bonding and surface energies, providing accuracy beyond simple Lennard-Jones potentials</li>
<li><strong>Automated Analysis</strong>: Python pipeline (<code>plot_energy.py</code>, <code>plot_xy.py</code>) that parses raw thermodynamic logs and trajectory dumps to generate &ldquo;health check&rdquo; dashboards</li>
<li><strong>Workflow Orchestration</strong>: Demonstrates the &ldquo;Input → Simulation → Analysis&rdquo; loop, automating the transition from raw <code>.lammpstrj</code> files to publication-ready plots</li>
<li><strong>Kokkos Support</strong>: Includes Kokkos execution commands for GPU/multi-threaded runs</li>
</ul>
<h3 id="simulation-parameters">Simulation Parameters</h3>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Parameter</th>
          <th style="text-align: left">Value</th>
          <th style="text-align: left">Purpose</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left"><strong>Ensemble</strong></td>
          <td style="text-align: left">NVT → NVE</td>
          <td style="text-align: left">Equilibration followed by energy conservation checks</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Potential</strong></td>
          <td style="text-align: left">EAM/alloy</td>
          <td style="text-align: left">Accurate metallic bonding for surface dynamics</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Minimization</strong></td>
          <td style="text-align: left">CG (1.0e-4)</td>
          <td style="text-align: left">Remove steric overlaps before dynamics</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Timestep</strong></td>
          <td style="text-align: left">5 fs (metal units)</td>
          <td style="text-align: left">EAM-appropriate integration step</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Trajectory dump</strong></td>
          <td style="text-align: left">every 5 steps (25 fs)</td>
          <td style="text-align: left">Tracks adatom site-to-site hops</td>
      </tr>
  </tbody>
</table>
<h2 id="usage">Usage</h2>
<p>The repository includes LAMMPS input scripts and Python analysis scripts. Run the LAMMPS scripts to generate trajectory data, then use the Python scripts to visualize the results.</p>
<h2 id="results">Results</h2>
<p>This workflow is documented in detail in companion blog posts:</p>
<ul>
<li><a href="/posts/adatom-cu-diffusion/">LAMMPS Tutorial: Copper and Platinum Adatom Diffusion</a> - Complete setup walkthrough with line-by-line script explanation and comparison of how heavier atoms behave differently on surfaces</li>
</ul>
]]></content:encoded></item><item><title>Mini-Protein Trajectory Generation</title><link>https://hunterheidenreich.com/projects/mini-protein-trajectories/</link><pubDate>Tue, 01 Aug 2023 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/mini-protein-trajectories/</guid><description>Automated GROMACS pipeline generating MD trajectories with atomic force extraction for Neural Network Potential training.</description><content:encoded><![CDATA[<h2 id="overview">Overview</h2>
<p>I developed an automated GROMACS pipeline to generate molecular dynamics (MD) datasets for machine learning applications. The workflow automates the simulation of capped dipeptides across nine distinct residue types, creating a diverse training set suitable for Neural Network Potentials (NNPs). The pipeline is built off Luca Tubiana&rsquo;s GROMACS tutorial (University of Trento); the Python analysis layer and the curated dipeptide dataset are my own.</p>
<h2 id="features">Features</h2>
<h3 id="automated-simulation-pipeline">Automated Simulation Pipeline</h3>
<ul>
<li><strong>End-to-End Scripting</strong>: Bash-automated workflow handling topology generation (<code>pdb2gmx</code>), solvation, ionization, and equilibration</li>
<li><strong>Langevin Dynamics</strong>: Implemented Stochastic Dynamics (SD) integration to ensure proper canonical (NVT) ensemble sampling</li>
<li><strong>High-Resolution Output</strong>: Configured to capture <strong>0.1 ps (100 fs) resolution</strong> trajectories, critical for capturing fast bond vibrations</li>
<li><strong>Force Extraction</strong>: Optimized output to <code>.trr</code> format preserving uncompressed atomic forces, a key requirement for force-matching in ML potentials</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-ini" data-lang="ini"><span style="display:flex;"><span><span style="color:#75715e">; md_langevin.mdp</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">integrator</span>  <span style="color:#f92672">=</span> <span style="color:#e6db74">sd        ; Stochastic dynamics for proper sampling</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">dt</span>          <span style="color:#f92672">=</span> <span style="color:#e6db74">0.001     ; 1 fs timestep</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">nstxout</span>     <span style="color:#f92672">=</span> <span style="color:#e6db74">100       ; Output every 100 steps = 0.1 ps resolution</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">tc-grps</span>     <span style="color:#f92672">=</span> <span style="color:#e6db74">Protein Non-Protein</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">tau_t</span>       <span style="color:#f92672">=</span> <span style="color:#e6db74">0.1  0.1  ; Friction constant (ps)</span>
</span></span></code></pre></div><h3 id="chemical-diversity-suite">Chemical Diversity Suite</h3>
<p>Designed to stress-test ML models against varied kinematic constraints:</p>
<table>
  <thead>
      <tr>
          <th>Category</th>
          <th>Residues</th>
          <th>Dynamics Challenge</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Aromatic</strong></td>
          <td>Phe, Trp</td>
          <td>π-stacking, bulky side chains</td>
      </tr>
      <tr>
          <td><strong>Constrained</strong></td>
          <td>Pro</td>
          <td>Cyclic backbone restrictions</td>
      </tr>
      <tr>
          <td><strong>Flexible</strong></td>
          <td>Gly, Ala</td>
          <td>High conformational entropy</td>
      </tr>
      <tr>
          <td><strong>Branched</strong></td>
          <td>Val, Ile, Leu</td>
          <td>Steric clashes, rotamer preferences</td>
      </tr>
      <tr>
          <td><strong>Sulfur-Containing</strong></td>
          <td>Met</td>
          <td>Flexible thioether linkage</td>
      </tr>
  </tbody>
</table>
<h2 id="usage">Usage</h2>
<p>The pipeline is executed via bash scripts, requiring GROMACS to be installed.</p>
<h2 id="results">Results</h2>
<ul>
<li><strong>Data Volume vs. Fidelity</strong>: Balanced high-frequency force outputs (every 100 steps) against storage constraints by automating post-processing extraction of forces into lightweight <code>.xvg</code> formats</li>
<li><strong>Force Field Consistency</strong>: Standardized the Amber03 force field and TIP3P water model across all residues to ensure consistent potential energy surfaces for downstream model training</li>
</ul>
<blockquote>
<p><strong>Note</strong>: This pipeline uses Amber03 for consistency across residue types. For production ML potentials, consider swapping to Charmm36m or similar modern force fields.</p></blockquote>
<h2 id="retrospective">Retrospective</h2>
<ul>
<li><strong>Demonstrative, not production-scale</strong>: the 1 ns trajectories exercise the pipeline and capture fast bond vibrations, but proper conformational sampling needs 100 ns to 1 µs runs. This is a working reference, not a finished dataset.</li>
<li><strong>Dated force field</strong>: Amber03 / TIP3P keeps the potential energy surface consistent across residues, but it is not state-of-the-art for ML-potential training; CHARMM36m or Amber ff19SB would be the upgrade path.</li>
<li><strong>Paused, not abandoned</strong>: a candidate to revive and extend (more residues, longer trajectories, Ramachandran analysis) for future force-matching work.</li>
</ul>
<h2 id="related-work">Related Work</h2>
<ul>
<li><a href="/posts/mini-proteins/">Mini-Protein Dynamics</a> - Detailed blog post on the simulation methodology</li>
</ul>
]]></content:encoded></item><item><title>Congressional Knowledge Graph &amp; Policy Classification</title><link>https://hunterheidenreich.com/projects/congressional-data-analysis/</link><pubDate>Wed, 01 Mar 2023 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/congressional-data-analysis/</guid><description>A 47,000+ bill knowledge graph from Congress.gov with co-sponsorship networks and TF-IDF baselines for 33-class policy-area classification.</description><content:encoded><![CDATA[<h2 id="overview">Overview</h2>
<p>A computational social science project that constructed a dataset of 47,000+ US congressional bills by extracting legislative text and metadata from the 115th-117th Congresses. The project creates a &ldquo;legislative graph&rdquo;
(linking sponsors, committees, and bill text) and establishes TF-IDF baseline models for policy area classification across 33 (highly imbalanced) policy classes, now hosted on Hugging Face to support reproducible political science research.</p>
<h2 id="features">Features</h2>
<h3 id="intelligent-data-acquisition">Intelligent Data Acquisition</h3>
<p>Standard APIs impose strict rate limits. I built a Selenium-based extraction engine to handle Congress.gov&rsquo;s complex DOM structures.</p>
<ul>
<li><strong>Optimization</strong>: Targeted aggregate endpoints (e.g., <code>/all-info</code>) to pull each bill&rsquo;s text and metadata in fewer requests.</li>
<li><strong>Resilience</strong>: Implemented a local caching layer to store raw HTML, separating the fetch step from the parse step. This made the parse step re-runnable without re-fetching, and minimized server load during iterative development.</li>
<li><strong>Graph construction</strong>: Beyond simple text, the script extracts relational data including co-sponsorship networks, committee assignments, and related bill lineage.</li>
</ul>
<h3 id="natural-language-processing">Natural Language Processing</h3>
<ul>
<li><strong>Corpus construction</strong>: Cleaned and normalized legislative text, removing procedural artifacts (e.g., &ldquo;A BILL TO&hellip;&rdquo;) to isolate semantic policy content.</li>
<li><strong>Feature engineering</strong>: Utilized TF-IDF vectorization with N-gram analysis to capture legislative jargon.</li>
<li><strong>Modeling</strong>: Benchmarked Naive Bayes, Logistic Regression, and gradient-boosted trees (XGBoost), reaching ~0.86 weighted F1 on bill summaries and up to ~0.89 on full text (cross-validated). Weighted F1, not raw accuracy, is the honest metric here: the 33 policy classes are severely imbalanced (Health has 5,911 bills; Social Sciences and History has 15).</li>
</ul>
<h2 id="usage">Usage</h2>
<p>The dataset is available on Hugging Face and can be loaded directly via the <code>datasets</code> library. The scraper can be run locally to fetch new bills.</p>
<h2 id="results">Results</h2>
<ul>
<li><strong>The &ldquo;partisan vocabulary&rdquo;</strong>: Feature importance analysis revealed distinct linguistic markers separating Democratic and Republican legislation, identifiable even without metadata.</li>
<li><strong>Temporal drift</strong>: Policy priorities and terminology showed measurable shifts across congressional sessions (115th vs 117th).</li>
<li><strong>Classification success</strong>: Simple linear models (Logistic Regression and Naive Bayes) proved effective at distinguishing policy domains, outperforming gradient-boosted trees on these sparse TF-IDF features and suggesting legislative language is highly structured.</li>
</ul>
<h2 id="impact--deliverables">Impact &amp; Deliverables</h2>
<ul>
<li><strong>Hugging Face dataset</strong>: Released a machine-readable, ML-ready dataset of modern bills (115th-117th Congresses) on Hugging Face for reproducible research.</li>
<li><strong>Open source tooling</strong>: Published the scraper and parsing logic to allow others to extend the dataset to future congresses.</li>
<li><strong>Academic benchmark</strong>: Establishing a clear baseline for &ldquo;Government NLP&rdquo; tasks, aiding in the automated transparency and monitoring of new legislation.</li>
</ul>
<h2 id="related-work">Related Work</h2>
<ul>
<li><a href="/posts/us-117th-congress-data-exploration/">117th Congress Data Exploration</a></li>
<li><a href="/posts/congressional-bill-policy-area-classification/">Congressional Bill Policy Area Classification</a></li>
</ul>
]]></content:encoded></item><item><title>IQCRNN: Certified Stability for Neural Networks</title><link>https://hunterheidenreich.com/projects/iqcrnn-pytorch/</link><pubDate>Wed, 11 May 2022 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/iqcrnn-pytorch/</guid><description>PyTorch IQCRNN enforcing stability guarantees on RNNs via Integral Quadratic Constraints and semidefinite programming.</description><content:encoded><![CDATA[<p>This project is a PyTorch re-implementation of <strong>IQCRNN</strong>, a method that enforces strict stability guarantees on Recurrent Neural Networks used in control systems.</p>
<h2 id="overview">Overview</h2>
<p>Standard Reinforcement Learning agents can behave unpredictably in unseen states. This approach forces the agent&rsquo;s weights to satisfy <strong>Integral Quadratic Constraints (IQC)</strong> via a projection step. Effectively, it solves a convex optimization problem (Semidefinite Program) inside the gradient descent loop to ensure the controller never violates Lyapunov stability criteria.</p>
<p>The method bridges classic <strong>Robust Control Theory</strong> (1990s) with <strong>Deep Reinforcement Learning</strong> (2020s), providing mathematical certificates of safety for neural network controllers.</p>
<h2 id="features">Features</h2>
<ul>
<li><strong>Hybrid Optimization:</strong> Interleaved standard Gradient Descent (PyTorch) with Convex Optimization (<code>cvxpy</code> + <code>MOSEK</code>) to project weights onto the &ldquo;safe&rdquo; manifold after each training step.</li>
<li><strong>Complex Constraints:</strong> Implemented the &ldquo;Tilde&rdquo; parametrization from the original paper to convexify the non-convex stability conditions of the RNN dynamics, transforming an intractable problem into a solvable Linear Matrix Inequality (LMI).</li>
<li><strong>Safety-Critical Domains:</strong> Applied the controller across six control systems (cartpole, inverted pendulum, nonlinear pendulum, pendubot, power grid, and vehicle dynamics), including unstable plants where &ldquo;crashing&rdquo; during training is unacceptable.</li>
</ul>
<h2 id="usage">Usage</h2>
<p>The repository includes training scripts for the inverted pendulum and power grid environments, demonstrating the stability guarantees in practice.</p>
<h2 id="results">Results</h2>
<p>This project was a deep dive into the tension between <strong>Safety</strong> and <strong>Speed</strong>.</p>
<ul>
<li><strong>The Bottleneck:</strong> Solving an SDP at every few steps of training is computationally expensive (interior-point SDP solvers scale steeply, roughly $O(n^6)$ in the matrix dimension). While it provided mathematical certificates of safety, it highlighted why these methods haven&rsquo;t yet overtaken standard PPO/SAC in production: the &ldquo;safety tax&rdquo; on training time is steep.</li>
<li><strong>The Lesson:</strong> It taught me that &ldquo;theoretical guarantees&rdquo; often come with &ldquo;engineering fine print.&rdquo; If I were to redo this today, I would look into <strong>differentiable convex optimization layers</strong> (like <code>cvxpylayers</code>) to make the projection end-to-end differentiable.</li>
<li><strong>The &ldquo;Rough Edges&rdquo;:</strong> The codebase has artifacts of its research origins (e.g., the <code>reqs.txt</code> dependency dump). Reading a dense control theory paper (Gu et al., 2021) and implementing the math correctly was the primary focus.</li>
</ul>
<h2 id="citation">Citation</h2>
<p>Credit to the original authors:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@misc</span>{gu2021recurrentneuralnetworkcontrollers,
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">title</span>=<span style="color:#e6db74">{Recurrent Neural Network Controllers Synthesis with Stability Guarantees for Partially Observed Systems}</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">author</span>=<span style="color:#e6db74">{Fangda Gu and He Yin and Laurent El Ghaoui and Murat Arcak and Peter Seiler and Ming Jin}</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">year</span>=<span style="color:#e6db74">{2021}</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">eprint</span>=<span style="color:#e6db74">{2109.03861}</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">archivePrefix</span>=<span style="color:#e6db74">{arXiv}</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">primaryClass</span>=<span style="color:#e6db74">{eess.SY}</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#a6e22e">url</span>=<span style="color:#e6db74">{https://arxiv.org/abs/2109.03861}</span>,
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><h2 id="related-work">Related Work</h2>
<ul>
<li><a href="/research/deconstructing-recurrence-attention-gating/">Deconstructing Recurrence and Attention Gating</a>: research on recurrent network architectures, providing context for why stability guarantees on RNNs matter</li>
</ul>
]]></content:encoded></item><item><title>PyConversations: Social Media Conversational Analysis</title><link>https://hunterheidenreich.com/projects/pyconversations-social-media-analysis/</link><pubDate>Tue, 01 Jun 2021 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/pyconversations-social-media-analysis/</guid><description>Undergraduate thesis exploring representation learning for social media text and developing tools for cross-platform conversational analysis.</description><content:encoded><![CDATA[<h2 id="overview">Overview</h2>
<p>Undergraduate thesis exploring representation learning for social media text and developing tools for cross-platform conversational analysis. Built PyConversations, a Python module for analyzing social media conversations, and found that domain-specific approaches often outperform large pre-trained models.</p>
<h2 id="features">Features</h2>
<h3 id="pyconversations-module">PyConversations Module</h3>
<ul>
<li><strong>Graph-based modeling</strong>: Models conversations as Directed Acyclic Graphs (DAGs) to quantify topological structure (depth, width, density)</li>
<li><strong>Unified interface</strong>: Polymorphic design normalizing heterogeneous data from Twitter, Reddit, 4chan, and Facebook into a single analysis schema</li>
<li><strong>Linguistic dynamics</strong>: Implements information-theoretic feature extraction, including harmonic mixing laws and entropy measures</li>
<li><strong>Stream processing</strong>: Memory-efficient generators to ingest and traverse multi-gigabyte JSON dumps (e.g., 135M+ Reddit posts) without loading the full corpus into RAM</li>
</ul>
<h3 id="research-contributions">Research Contributions</h3>
<ul>
<li><strong>Representation learning</strong>: Investigated domain-specific vs. general-purpose Transformers (BERT vs. specialized variants) on social media text</li>
<li><strong>Topological analysis</strong>: Demonstrated that conversational structure (context) is as critical as content for classification tasks</li>
<li><strong>Cross-platform study</strong>: Comparative analysis of communication dynamics across moderated (Reddit/Twitter) and unmoderated (4chan) spaces</li>
</ul>
<h2 id="usage">Usage</h2>
<p>The PyConversations module can be imported into Python scripts to parse and analyze social media datasets.</p>
<h2 id="results">Results</h2>
<ul>
<li><strong>Model performance</strong>: Smaller, domain-specific approaches frequently outperformed standard pre-trained models</li>
<li><strong>Context importance</strong>: Conversational context and dialogue structure proved crucial for understanding social media interactions</li>
<li><strong>Domain adaptation</strong>: Social media text benefits from specialized handling over generic approaches</li>
<li><strong>Cross-platform challenges</strong>: Different platforms require adapted approaches despite seeming similarities</li>
</ul>
<h2 id="team--recognition">Team &amp; Recognition</h2>
<ul>
<li><strong>Hunter Heidenreich</strong> - Lead Researcher and Developer</li>
<li><strong>Jake Williams</strong> - Faculty Advisor</li>
<li><strong>First Place - Research Undergraduate Senior Thesis</strong> at Drexel University</li>
</ul>
<h2 id="impact">Impact</h2>
<p>This library served as the engineering backbone for my thesis, <a href="/research/look-dont-tweet/">Look, Don&rsquo;t Tweet</a>, enabling the processing of 308 million posts to evaluate Transformer performance on toxic data.</p>
<p>The findings about model performance suggested that specialized domains require tailored model architectures, a perspective that has become more relevant as the field continues to evolve.</p>
]]></content:encoded></item><item><title>Cartesian Genetic Programming in Julia</title><link>https://hunterheidenreich.com/projects/cgp-julia/</link><pubDate>Sun, 18 Nov 2018 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/cgp-julia/</guid><description>A fork of Dennis Wilson's CGP.jl applying Cartesian Genetic Programming to Atari RL tasks; my work was the Atari experiments, not the core framework.</description><content:encoded><![CDATA[<p>Written in 2018, this was an exploration into <strong>Evolutionary Algorithms</strong> applied to Reinforcement Learning tasks (specifically Atari games). It is a fork of <a href="https://github.com/d9w/CGP.jl">d9w/CGP.jl</a> (Dennis Wilson, Apache 2.0); my work centered on the Atari reinforcement-learning experiments rather than the core CGP framework.</p>
<h2 id="overview">Overview</h2>
<p>Standard Cartesian Genetic Programming (CGP) relies heavily on mutation. The upstream library hybridizes CGP with <strong>NEAT (NeuroEvolution of Augmenting Topologies)</strong> concepts to protect topological innovation through speciation.</p>
<p>My goal in forking it was to evolve graph-based programs that could learn Atari control policies using gradient-free optimization.</p>
<h2 id="features">Features</h2>
<p>The upstream framework provides the CGP machinery this project builds on:</p>
<ul>
<li><strong>Graph-based Crossover:</strong> Crossover operators such as <code>subgraph_crossover</code> and <code>aligned_node_crossover</code> that handle the destructive nature of mating graph structures.</li>
<li><strong>Speciation:</strong> A NEAT-inspired compatibility-distance metric (<code>cgpneat.jl</code>) to maintain population diversity and prevent premature convergence.</li>
<li><strong>Active Gene Tracking:</strong> Differentiates between &ldquo;active&rdquo; nodes (those contributing to output) and &ldquo;junk DNA,&rdquo; focusing mutation on phenotypic changes.</li>
</ul>
<p>My own contribution was the <strong>Atari reinforcement-learning layer</strong> on top of this: experiment variants (<code>action_atari.jl</code>, <code>original_atari.jl</code>, <code>manual_atari.jl</code>, <code>play_atari.jl</code>, <code>param_sweep.jl</code>), custom fitness and scoring functions, early-stopping and completion-percentage logging, multithreading and <code>pmap</code> multiprocessing attempts (reverted to single-thread), and config tuning to match a reference paper&rsquo;s hyperparameters.</p>
<h2 id="usage">Usage</h2>
<p>The library provides a Julia API for defining CGP graphs, configuring evolutionary parameters, and running the evolutionary loop against custom environments.</p>
<h2 id="results">Results</h2>
<p>Looking back, this codebase captures a transitional moment where I was moving from scripting to library design.</p>
<ul>
<li><strong>The Ambition:</strong> Getting CGP graphs to learn Atari policies under the mixed-type regime (RGB-array inputs, scalar action outputs) was an ambitious undertaking for my software engineering skills at the time.</li>
<li><strong>The &ldquo;Legacy&rdquo; Code:</strong> The project relies on the now-deprecated Julia v0.6 and uses <code>eval(parse(...))</code> patterns for configuration (a significant performance anti-pattern in modern Julia).</li>
<li><strong>The Lesson:</strong> It taught me the difficulty of designing genetic operators that respect topological constraints, a lesson that informs my current understanding of optimization in structured spaces.</li>
</ul>
]]></content:encoded></item><item><title>FFTW Compiler in Haskell</title><link>https://hunterheidenreich.com/projects/fftw-compiler-haskell/</link><pubDate>Thu, 15 Mar 2018 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/fftw-compiler-haskell/</guid><description>Reverse-engineering the genfft logic to generate optimized C kernels for Fast Fourier Transforms using Haskell metaprogramming.</description><content:encoded><![CDATA[<p>Written during my sophomore year, this project was an attempt to look inside the &ldquo;black box&rdquo; of one of the fastest Fourier transform libraries: <strong>FFTW</strong>.</p>
<h2 id="overview">Overview</h2>
<p>I sought to replicate the logic of FFTW&rsquo;s <code>genfft</code>: a metaprogram that generates straight-line, highly optimized C code. The goal was to understand how abstract algebra (group theory) could be translated into efficient machine code through symbolic manipulation.</p>
<h2 id="features">Features</h2>
<p>This was my first deep dive into <strong>functional metaprogramming</strong> and <strong>compiler theory</strong>:</p>
<ul>
<li><strong>Symbolic AST:</strong> Modeled mathematical operations as a Directed Acyclic Graph (DAG) in Haskell (<code>data Node</code>), separating the <em>definition</em> of the math from its <em>execution</em>.</li>
<li><strong>Algebraic Simplification:</strong> Implemented a symbolic optimization pass that pruned operations at compile-time (e.g., eliminating multiplications by $1$, $0$, or $-1$) before code generation.</li>
<li><strong>Monadic State Management:</strong> Used Haskell&rsquo;s <code>State</code> Monad to manage the graph construction and memoization, ensuring common subexpressions (like reusable cosine factors) were calculated only once.</li>
<li><strong>Code Generation:</strong> The system outputted unrolled, straight-line C code (e.g., <code>fftw4.c</code>), mimicking the &ldquo;codelets&rdquo; used by the actual FFTW library.</li>
</ul>
<h2 id="usage">Usage</h2>
<p>The compiler is run via the command line, taking the desired FFT size as input and outputting the optimized C code.</p>
<h2 id="results">Results</h2>
<p>Looking back, this project represents a pivotal moment where I moved from &ldquo;writing programs&rdquo; to &ldquo;writing tools that write programs.&rdquo;</p>
<ul>
<li><strong>The &ldquo;Magic&rdquo;:</strong> It demystified high-performance computing. I learned that speed often comes from unrolling recursion and managing register pressure at compile time alongside writing fast loops.</li>
<li><strong>The &ldquo;Rough Edges&rdquo;:</strong> The scheduler (coloring nodes Red/Blue for register allocation) was a heuristic approximation of the optimal Aho-Johnson-Ullman algorithm.</li>
<li><strong>Legacy:</strong> The core lesson that domain-specific compilers can outperform hand-tuned generic code remains relevant to my current work in optimizing scientific computing kernels.</li>
</ul>
]]></content:encoded></item><item><title>Term Schedule Optimizer</title><link>https://hunterheidenreich.com/projects/term-schedule-optimizer/</link><pubDate>Wed, 15 Feb 2017 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/term-schedule-optimizer/</guid><description>A constraint satisfaction solver built to generate conflict-free university schedules from web-scraped course data.</description><content:encoded><![CDATA[<p>A Python-based automation tool I wrote as a freshman to solve the &ldquo;Term Master Schedule&rdquo; problem (and used throughout my undergrad from 2016 to 2020).</p>
<h2 id="overview">Overview</h2>
<p>Manually creating a university schedule involves solving a <strong>Constraint Satisfaction Problem (CSP)</strong> with multiple variables:</p>
<ul>
<li><strong>Hard Constraints:</strong> No time overlaps between classes.</li>
<li><strong>Soft Constraints:</strong> Preferences for &ldquo;no 8 AMs,&rdquo; specific lunch breaks, or maximizing free days.</li>
</ul>
<p>The naive approach (manually checking every possible combination) becomes intractable as the number of courses and sections grows.</p>
<h2 id="features">Features</h2>
<p>I built a script that:</p>
<ol>
<li><strong>Scraped Data:</strong> Parsed the Drexel WebTMS (Term Master Schedule) using <code>lxml</code> to build a localized dataset of course availability.</li>
<li><strong>Solved for X:</strong> Implemented a <strong>recursive backtracking algorithm</strong> to generate every valid schedule permutation that satisfied user-defined constraints.</li>
</ol>
<h3 id="the-algorithm">The Algorithm</h3>
<p>The core of this project is a <code>recursive_generator</code> function that implements a valid CSP solver using backtracking. It performs a recursive depth-first search that:</p>
<ol>
<li>Takes a set of variables (courses).</li>
<li>Checks constraints (time overlaps, lunch hours, max classes per day).</li>
<li>Backtracks when a branch fails.</li>
</ol>
<p>It is the same backtracking pattern used in everything from Sudoku solvers to compiler register allocation.</p>
<h2 id="usagegameplay">Usage/Gameplay</h2>
<p>The tool is run via the command line, taking a list of desired courses and outputting valid schedule combinations.</p>
<h2 id="results">Results</h2>
<p>This tool saved me (and several friends) hours of planning time each quarter. While the scraping logic was fragile (dependent on 2017 HTML structures), the core logic (a depth-first search through the state space of possible schedules) remains a fundamental algorithmic pattern.</p>
]]></content:encoded></item><item><title>Rubik's Cube Sonification</title><link>https://hunterheidenreich.com/projects/rubiks-cube-player/</link><pubDate>Sun, 29 Jan 2017 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/rubiks-cube-player/</guid><description>A hackathon experiment in algorithmic musicology: mapping the visual entropy of a Rubik's Cube to harmonic audio synthesis.</description><content:encoded><![CDATA[<h2 id="overview">Overview</h2>
<p>Built in under 24 hours at the Drexel 2017 Music Hackathon, this project attempts to answer a question: <em>What does order sound like?</em></p>
<p>The system uses a webcam to scan a Rubik&rsquo;s cube face and algorithmically generates audio based on the color configuration. A scrambled cube generates dissonant, complex waveforms; a solved cube resolves into a pure, harmonious chord.</p>
<h2 id="features">Features</h2>
<p>This freshman-year project was built on <strong>first principles</strong>:</p>
<ul>
<li><strong>Manual Waveform Synthesis:</strong> The audio engine generates raw 8-bit PCM audio byte-by-byte using sine functions (<code>math.sin</code>), played at a 16 kHz sample rate.</li>
<li><strong>Algorithmic Harmony:</strong> Colors are mapped to musical intervals. The &ldquo;center&rdquo; color establishes the root note (Tonic), while the surrounding &ldquo;cubies&rdquo; determine the chord structure and melody using equal temperament frequency calculations ($f = f_0 \cdot 2^{n/12}$).</li>
</ul>
<h2 id="usagegameplay">Usage/Gameplay</h2>
<p>The application runs via a Python script, requiring a webcam to scan the Rubik&rsquo;s cube.</p>
<h2 id="results">Results</h2>
<p>Looking back at this code 8 years later, it serves as a &ldquo;time capsule&rdquo; of my early engineering mindset.</p>
<ul>
<li><strong>The &ldquo;Hack&rdquo;:</strong> The computer vision relied on hardcoded pixel coordinates and raw OS shell calls, classic &ldquo;glue code&rdquo; behavior typical of hackathons.</li>
<li><strong>The Lesson:</strong> While brittle, the project successfully demonstrated how to bridge the gap between physical entropy and digital signal processing using fundamental programming concepts.</li>
</ul>
<h2 id="related-content">Related Content</h2>
<ul>
<li><a href="/videos/rubiks-cube-player-hackathon/">Video Demonstration</a></li>
</ul>
]]></content:encoded></item><item><title>Elemental Brawl</title><link>https://hunterheidenreich.com/projects/elemental-brawl/</link><pubDate>Fri, 24 Oct 2014 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/projects/elemental-brawl/</guid><description>A high school fighting game where periodic table elements come to life. Though the Kickstarter failed, leading a creative team was formative.</description><content:encoded><![CDATA[<h2 id="overview">Overview</h2>
<p>In 2014, as a junior in high school, I had a dream: to create a fighting game where the elements of the periodic table came to life to duke it out. <em>Elemental Brawl</em> was born.</p>
<p>The vision was ambitious for a 16-year-old: 37 playable characters (each a different element), 25 stages, 30 music tracks, 40 items, 100+ achievements, and 4-player LAN support. I assembled a team of talented artists and a composer, launched a Kickstarter campaign, and built a playable demo.</p>















<figure class="post-figure center ">
    <img src="/img/elemental-brawl/elemental-brawl-victory.webp"
         alt="Elemental Brawl gameplay screenshot showing victory screen"
         title="Elemental Brawl gameplay screenshot showing victory screen"
         
         
         loading="lazy"
         class="post-image">
    
    <figcaption class="post-caption">Victory screen from the playable demo</figcaption>
    
</figure>

<h2 id="features">Features</h2>
<p>Each element had its own personality and fighting style based on its chemical properties:</p>
<p><strong>Oxygen</strong> was a gaseous fighter, utilizing small bubbles that formed into legs for amazing speed and agility. A fun-loving character but a bit of a hot head. Get it too hyped up and it would burst into flames, scorching all its foes!</p>















<figure class="post-figure center ">
    <img src="/img/elemental-brawl/elemental-brawl-oxygen-animated.gif"
         alt="Animated sprite of Oxygen character from Elemental Brawl"
         title="Animated sprite of Oxygen character from Elemental Brawl"
         
         
         loading="lazy"
         class="post-image">
    
    <figcaption class="post-caption">Oxygen&rsquo;s animated sprite by Molly Heady-Carroll</figcaption>
    
</figure>
















<figure class="post-figure center ">
    <img src="/img/elemental-brawl/elemental-brawl-oxygen-concept-art.webp"
         alt="Concept art of Oxygen character"
         title="Concept art of Oxygen character"
         
         
         loading="lazy"
         class="post-image">
    
    <figcaption class="post-caption">Oxygen concept art by Noah Evans</figcaption>
    
</figure>

<p><strong>Carbon</strong> was a nonmetallic brawler with a burning passion for the art of fighting. Carbon hit hard, jumped high, and launched small fireballs at enemies. When charged with enough energy, it would pressurize itself into solid diamond and launch into the sky with atomic force.</p>















<figure class="post-figure center ">
    <img src="/img/elemental-brawl/elemental-brawl-carbon-animated.gif"
         alt="Animated sprite of Carbon character from Elemental Brawl"
         title="Animated sprite of Carbon character from Elemental Brawl"
         
         
         loading="lazy"
         class="post-image">
    
    <figcaption class="post-caption">Carbon&rsquo;s animated sprite</figcaption>
    
</figure>
















<figure class="post-figure center ">
    <img src="/img/elemental-brawl/elemental-brawl-carbon-concept-art.webp"
         alt="Concept art of Carbon character"
         title="Concept art of Carbon character"
         
         
         loading="lazy"
         class="post-image">
    
    <figcaption class="post-caption">Carbon concept art</figcaption>
    
</figure>

<p>We even had concept art for more complex characters like <strong>Iron</strong>:</p>















<figure class="post-figure center ">
    <img src="/img/elemental-brawl/elemental-brawl-iron-z-28-concept-art.webp"
         alt="Concept art of Iron character"
         title="Concept art of Iron character"
         
         
         loading="lazy"
         class="post-image">
    
    <figcaption class="post-caption">Iron concept art</figcaption>
    
</figure>

<h2 id="usagegameplay">Usage/Gameplay</h2>
<p>The game was a 2D fighting game with a focus on elemental interactions and combos.</p>
<h2 id="results">Results</h2>
<p>We launched our <a href="https://www.kickstarter.com/projects/621222258/elemental-brawl">Kickstarter campaign</a> on October 24, 2014, seeking $19,000 to fund the full development.</p>















<figure class="post-figure center ">
    <img src="/img/elemental-brawl/elemental-brawl-demo-combustion-consumption.webp"
         alt="Elemental Brawl demo gameplay showing Combustion Consumption stage"
         title="Elemental Brawl demo gameplay showing Combustion Consumption stage"
         
         
         loading="lazy"
         class="post-image">
    
    <figcaption class="post-caption">The Combustion Consumption stage from the demo</figcaption>
    
</figure>

<p>The campaign raised $1,910 from 31 backers; people who believed in a high schooler&rsquo;s dream. While we didn&rsquo;t reach our funding goal, the experience was invaluable.</p>
<h2 id="the-team">The Team</h2>
<p>I was fortunate to work with talented people:</p>
<ul>
<li><strong>Noah Evans</strong> (Concept Artist/GUI Artist): A visual arts major at School of the Arts in South Carolina</li>
<li><strong>Molly Heady-Carroll</strong> (Character Animator): An Irish artist earning her MA in Game Art from HKU in the Netherlands</li>
<li><strong>Sean Pack</strong> (Composer): A versatile composer for film, TV, video games, and theater</li>
<li><strong>Bryan Moore</strong> (Stage Artist): A SCAD graduate specializing in backgrounds and environments</li>
</ul>
<h2 id="retrospective-2025">Retrospective (2025)</h2>
<p>Looking back over a decade later, the project was a massive success in terms of learning. We accomplished several major milestones:</p>
<ul>
<li>Built a playable demo from scratch</li>
<li>Coordinated a remote team across multiple countries</li>
<li>Created original music, art, and animations</li>
<li>Ran a real crowdfunding campaign</li>
<li>Learned what it takes to turn an idea into something tangible</li>
</ul>
<p>The project taught me about project management, creative collaboration, and the gap between vision and execution. These lessons carried forward into every project since.</p>
]]></content:encoded></item></channel></rss>