Hi, I’m Hunter.

I’m an AI Research Scientist & Engineer at Roots.ai, bridging abstract ML research and production deployment. I specialize in Large Language Models (LLMs) and Vision-Language Models (VLMs) for document processing, and conduct research in physics-informed AI for scientific simulation. I take ideas from papers to working code, building open-source tools and real-world systems. More about me →
Document Processing
GutenOCR Mascot

GutenOCR: A Grounded Vision-Language Front-End for Documents

GutenOCR is a family of vision-language models designed to serve as a ‘grounded OCR front-end’, providing high-quality text transcription and explicit geometric grounding.

Time Series Forecasting
Forecasting comparison of different neural architectures on the Multiscale Lorenz-96 system

Optimizing Sequence Models for Dynamical Systems

We systematically ablate core mechanisms of Transformers and RNNs, finding that attention-augmented Recurrent Highway Networks outperform standard Transformers on forecasting high-dimensional chaotic systems.

Machine Learning Fundamentals
Three-panel diagram showing an original sequence, its time-warped version, and the gate values derived from requiring time warping invariance

Can Recurrent Neural Networks Warp Time? (ICLR 2018)

Tallec and Ollivier show that requiring invariance to time transformations in recurrent models leads to gating mechanisms, recovering key LSTM components from first principles. They propose the chrono initialization for gate biases that improves learning of long-term dependencies.

Machine Learning Fundamentals
Graph network block diagram showing input graph transformed through edge, node, and global update steps to produce an updated graph

Relational Inductive Biases in Deep Learning (2018)

Battaglia et al. argue that combinatorial generalization requires structured representations, systematically analyze the relational inductive biases in standard deep learning architectures (MLPs, CNNs, RNNs), and present the graph network as a unifying framework that generalizes and extends prior graph neural network approaches.

Machine Learning Fundamentals
Log-log plot comparing scaling laws across six architectures showing the vanilla Transformer has the steepest slope

Scaling Laws vs Model Architectures: Inductive Bias

Tay et al. systematically compare scaling laws across ten diverse architectures (Transformers, Switch Transformers, Performers, MLP-Mixers, and others), finding that the vanilla Transformer has the best scaling coefficient and that the best-performing architecture changes across compute regions.

Machine Learning Fundamentals
SE(3)-Transformer architecture showing invariant attention weights modulating equivariant value messages on a 3D point cloud

SE(3)-Transformers: Equivariant Attention for 3D Data

Fuchs et al. introduce the SE(3)-Transformer, which combines self-attention with SE(3)-equivariance for 3D point clouds and graphs. Invariant attention weights modulate equivariant value messages from tensor field networks, resolving angular filter constraints while enabling data-adaptive, anisotropic processing.

Machine Learning Fundamentals
Comparison of planar CNN (translation only) versus spherical CNN (SO(3)-equivariant) showing how filters rotate on the sphere

Spherical CNNs: Rotation-Equivariant Networks on the Sphere

Cohen et al. introduce Spherical CNNs that achieve SO(3)-equivariance by defining cross-correlation on the sphere and rotation group, computed efficiently via generalized FFT algorithms from non-commutative harmonic analysis.

Machine Learning Fundamentals
The three quarks of attention: multiplexing (additive), output gating (multiplicative output), and synaptic gating (multiplicative weight)

The Quarks of Attention: Building Blocks of Attention

Baldi and Vershynin systematically classify the fundamental building blocks of attention (activation attention, output gating, synaptic gating) by source, target, and mechanism, then prove capacity bounds showing that gating introduces quadratic terms sparsely, gaining expressiveness without the full cost of polynomial activations.

Computational Chemistry
Density plot showing training vs generated physicochemical property distribution

Molecular Sets (MOSES): A Generative Modeling Benchmark

MOSES introduces a comprehensive benchmarking platform for molecular generative models, offering standardized datasets, evaluation metrics, and baselines. By providing a unified measuring stick, it aims to resolve reproducibility challenges in chemical distribution learning.

Document Processing
Chart showing the trade-off between accuracy and throughput in document automation

The Reliability Trap: The Limits of 99% Accuracy

We explore the ‘Silent Failure’ mode of LLMs in production: the limits of 99% accuracy for reliability, how confidence decays in long documents, and why standard calibration techniques struggle to fix it.

Document Processing
Conceptual diagram of page stream segmentation sorting pages into documents

The Evolution of Page Stream Segmentation: Rules to LLMs

We trace the history of Page Stream Segmentation (PSS) through three eras (Heuristic, Encoder, and Decoder) and explain how privacy-preserving, localized LLMs enable true semantic processing.

Document Processing
Statistics of the PubMed-OCR dataset including number of articles, pages, words, and bounding boxes.

PubMed-OCR: PMC Open Access OCR Annotations

PubMed-OCR provides 1.5M pages of scientific articles with comprehensive OCR annotations and bounding boxes to support layout-aware modeling and document analysis.