Hunter Heidenreich | ML Research Scientist — Page 11

Computational Chemistry

Imago: Open-Source Chemical Structure Recognition (2011)

Imago is an open-source, cross-platform C++ toolkit designed to recognize 2D chemical structure images from scientific papers and convert them into machine-readable molecule formats using a rule-based pipeline.

Computational Chemistry

Kekulé-1 System for Chemical Structure Recognition

This paper introduces Kekulé-1, one of the first successful Optical Chemical Structure Recognition (OCSR) systems. It details a hybrid approach using neural networks for character recognition and heuristic vectorization for bond detection, achieving 98.9% accuracy on a test set of 524 structures.

Computational Chemistry

OSRA at TREC-CHEM 2011: Optical Structure Recognition

This paper details the algorithmic pipeline of OSRA, an open-source tool that converts raster images of chemical diagrams into connection tables (SMILES/SDF). It outlines specific heuristics for page segmentation, vectorization, and atom recognition used in the TREC-CHEM Image2Structure task.

Computational Chemistry

Structural Analysis of Handwritten Chemical Formulas

This paper proposes a strategy for interpreting handwritten chemical formulas by converting bitmap images into a dynamic structural graph of quadrilaterals. It achieves ~97% recognition on graphical elements by using recursive ‘specialists’ to identify chemical bonds and rings.

Computational Social Science
NOMINATE spatial plot showing Senate vote on Balanced Budget Amendment (1995) with legislators positioned on liberal-conservative dimension

A Spatial Model for Legislative Roll Call Analysis

This paper introduces NOMINATE, a probabilistic spatial model that recovers metric coordinates for legislators and roll calls from nominal voting data, demonstrating that a single liberal-conservative dimension explains the vast majority of Congressional voting behavior.

Computational Chemistry
Automatic chemical image recognition pipeline from raster image to structured file

Automatic Recognition of Chemical Images

This methodological paper presents a system for digitizing chemical images into SDF files. It utilizes a custom vectorization algorithm and chemical rule validation, achieving 94% accuracy on benchmark datasets compared to 50% for commercial tools.

Planetary Science
Orbital diagram showing chaotic planetary trajectories

Chaotic Evolution of the Solar System (Sussman 1992)

Sussman and Wisdom’s 1992 study used the Supercomputer Toolkit and symplectic mapping to integrate the entire Solar System for 100 million years, confirming chaotic behavior with an exponential divergence timescale of ~4 million years and demonstrating that long-term planetary motion is fundamentally unpredictable.

Computational Chemistry
Chemical Literature Data Extraction: The CLiDE Project

Chemical Literature Data Extraction: The CLiDE Project

The CLiDE project presents a foundational architecture for Optical Chemical Structure Recognition (OCSR). It details a three-phase pipeline to convert bitmapped journal pages into chemically significant connection tables, handling complex features like stereochemistry.

Computational Chemistry
Visualization of Gabor wavelets and Kohonen networks for chemical image classification

Chemical Machine Vision

This 2003 paper introduces a machine vision approach for extracting chemical metadata from raster images. By using Gabor wavelets for feature extraction and Kohonen networks for classification, it distinguishes between chemical and non-chemical images, as well as ring and non-ring systems, without requiring high-resolution inputs.

Computational Chemistry
ChemReader: Automated Structure Extraction

ChemReader: Automated Structure Extraction

This paper presents ChemReader, a fully automated optical structure recognition tool that converts raster images of chemical diagrams into machine-readable formats. It introduces a modified Hough transform for bond detection and a chemical spell checker that improves OCR accuracy from 66% to 87%.

Machine Learning Fundamentals
Diagram showing distributed representations with three pools of units (AGENT, RELATIONSHIP, PATIENT) connected via role/identity bindings

Distributed Representations: A Foundational Theory

Geoffrey Hinton’s 1984 technical report that formally derives the efficiency of distributed representations (coarse coding) and demonstrates their properties of automatic generalization, content-addressability, and robustness to damage.

Astrobiology
Abstract artistic representation of alkaline hydrothermal vents with spiraling geological formations

Drive to Life on Wet and Icy Worlds: Alkaline Vent Theory

This paper reformulates the submarine alkaline hydrothermal theory for the origin of life, positing that life emerged as a free energy converter driven by specific geological disequilibria - specifically redox and pH gradients across inorganic precipitate membranes - utilizing hydrogen, methane, and CO2 as primary feedstocks.