I am Hunter, an ML Research Scientist at Roots.ai based in Jersey City, NJ. I train large language and vision models at production scale, and my research interests sit at the intersection of foundation models and physical science.

At Roots.ai, I build and ship VLMs and LLMs on DGX H100 clusters using DeepSpeed ZeRO and distributed training. I recently led the training and release of GutenOCR, an open-source Vision-Language Model family (3B and 7B parameters) with open weights, open code, and a 1.5M-page open dataset. GutenOCR achieved a composite grounded OCR score of 0.82 on in-domain evaluation and a region-level CER of 0.053 on the Fox Benchmark, surpassing prior open-source and dedicated OCR baselines. I publish at peer-reviewed venues including COLING 2025, EMNLP, and AIES.
Before industry, I spent two years at Harvard researching scientific computing for molecular dynamics. I designed generative surrogate models (Transformers, GNNs, VAEs) to accelerate molecular dynamics simulations, built data pipelines for GROMACS and LAMMPS workflows, and developed probabilistic forecasting methods for chaotic physical systems. I passed my qualifying exams and advanced to PhD candidacy before choosing to return to applied ML with my Master’s degree.

My journey in software started much earlier, with indie game development and physics-based engines. At Drexel University, I moved into machine learning and computational linguistics, including work at SAP’s Conversational AI Labs and research on the social impacts of generative text.
My research roots are in physical science, and I’m actively exploring how foundation model training techniques transfer to scientific domains: computational chemistry, materials science, and molecular generation.
What You’ll Find Here
This site hosts my research and shares practical ML knowledge:
- Research: Publications on VLMs, LLMs, scientific ML, and AI safety
- Posts: Technical writing on computational chemistry, NLP/ML, and scientific computing
- Projects: Open-source tools and experiments
- Notes: Detailed annotations on 190+ papers across ML, scientific computing, and document understanding
- Videos: Molecular dynamics simulations, indie games, and audio visualizations
Let’s Connect
I am always open to discussing new ideas, sharing insights on scaling ML systems, and exploring the intersection of AI and the physical sciences. Whether you are building production pipelines, conducting research, or just want to talk shop about VLMs, open-source datasets, or indie game dev, feel free to reach out via email or connect with me on LinkedIn.