Hunter Heidenreich | ML Research Scientist — Page 24

Optical Chemical Structure Recognition
Automatic chemical image recognition pipeline from raster image to structured file

Automatic Recognition of Chemical Images

This methodological paper presents a system for digitizing chemical images into SDF files. It utilizes a custom vectorization algorithm and chemical rule validation, achieving 94% accuracy on benchmark datasets compared to 50% for commercial tools.

Planetary Science
Orbital diagram showing chaotic planetary trajectories

Chaotic Evolution of the Solar System (Sussman 1992)

Sussman and Wisdom’s 1992 study used the Supercomputer Toolkit and symplectic mapping to integrate the entire Solar System for 100 million years, confirming chaotic behavior with an exponential divergence timescale of ~4 million years and demonstrating that long-term planetary motion is fundamentally unpredictable.

Optical Chemical Structure Recognition
Three-phase pipeline converting scanned chemical diagrams into connection tables via primitive recognition and semantic interpretation

Chemical Literature Data Extraction: The CLiDE Project

The CLiDE project presents a foundational architecture for Optical Chemical Structure Recognition (OCSR). It details a three-phase pipeline to convert bitmapped journal pages into chemically significant connection tables, handling complex features like stereochemistry.

Optical Chemical Structure Recognition
Visualization of Gabor wavelets and Kohonen networks for chemical image classification

Chemical Machine Vision

This 2003 paper introduces a machine vision approach for extracting chemical metadata from raster images. By using Gabor wavelets for feature extraction and Kohonen networks for classification, it distinguishes between chemical and non-chemical images, as well as ring and non-ring systems, without requiring high-resolution inputs.

Optical Chemical Structure Recognition
Overview of the ChemReader pipeline for extracting chemical structures from raster images using Hough transform and OCR

ChemReader: Automated Structure Extraction

This paper presents ChemReader, a fully automated optical structure recognition tool that converts raster images of chemical diagrams into machine-readable formats. It introduces a modified Hough transform for bond detection and a chemical spell checker that improves OCR accuracy from 66% to 87%.

Machine Learning
Diagram showing distributed representations with three pools of units (AGENT, RELATIONSHIP, PATIENT) connected via role/identity bindings

Distributed Representations: A Foundational Theory

Geoffrey Hinton’s 1984 technical report that formally derives the efficiency of distributed representations (coarse coding) and demonstrates their properties of automatic generalization, content-addressability, and robustness to damage.

Planetary Science
Abstract artistic representation of alkaline hydrothermal vents with spiraling geological formations

Drive to Life on Wet and Icy Worlds: Alkaline Vent Theory

This paper reformulates the submarine alkaline hydrothermal theory for the origin of life, positing that life emerged as a free energy converter driven by specific geological disequilibria - specifically redox and pH gradients across inorganic precipitate membranes - utilizing hydrogen, methane, and CO2 as primary feedstocks.

Molecular Simulation
Graph of the Lennard-Jones 12-6 potential showing the characteristic attractive and repulsive forces

Dynamical Corrections to TST for Surface Diffusion

This paper bridges Molecular Dynamics and Transition State Theory by applying a dynamical corrections formalism to surface diffusion, identifying a low-temperature bounce-back mechanism causing non-Arrhenius behavior.

Molecular Simulation
Embedding energy and effective charge functions for Ni and Pd from the original EAM paper

Embedded-Atom Method User Guide: Voter's 1994 Chapter

This 1994 handbook chapter serves as a practical user guide for the Embedded-Atom Method (EAM). It details the theoretical derivation from density-functional theory, synthesizes related methods like the Glue Model, and provides a complete tutorial on fitting potentials, illustrated with a specific implementation for the Ni-Al-B system.

Molecular Simulation
Embedding energy and effective charge functions for Ni and Pd from the original EAM paper

Embedded-Atom Method: Theory and Applications Review

This 1993 review systematizes the Embedded-Atom Method (EAM) as a practical semi-empirical approach for metallic systems. It synthesizes theory, applications, and connections to related methods while addressing the limitations of pair potentials.

Molecular Simulation
Graph of the Lennard-Jones 12-6 potential showing the characteristic attractive and repulsive forces

Evans 1986: Thermal Conductivity of Lennard-Jones Fluid

This paper validates the homogeneous Evans method for calculating thermal conductivity against experimental Argon data. It demonstrates broad agreement across the phase diagram but identifies significant non-monotonic behavior and enhanced long-time tails near the critical point.

Computational Biology
Four types of protein folding energy landscapes from left to right: smooth funnel, rugged funnel with kinetic traps, moat funnel, and champagne glass funnel

Funnels, Pathways, and Energy Landscapes of Protein Folding

This paper resolves Levinthal’s paradox by replacing the single-pathway view with a statistical energy landscape approach. It introduces the concepts of the folding funnel, the glass transition in proteins, and the ‘stability gap’ as a design principle for foldable sequences.