Computational Biology
Four types of protein folding energy landscapes from left to right: smooth funnel, rugged funnel with kinetic traps, moat funnel, and champagne glass funnel

Funnels, Pathways, and Energy Landscapes of Protein Folding

This seminal work resolves Levinthal’s paradox by replacing the single-pathway view with a statistical energy landscape approach. It introduces the concepts of the folding funnel, the glass transition in proteins, and the ‘stability gap’ as a design principle for foldable sequences.

Computational Chemistry
Graph Perception for Chemical Structure OCR

Graph Perception for Chemical Structure OCR

This 1990 paper presents an early OCR pipeline for converting hand-drawn or printed chemical structures into connectivity tables. It introduces novel sweeping algorithms for graph perception and a matrix-based feature extraction method for character recognition.

Machine Learning Fundamentals
Visualization of inverse problem showing one input mapping to multiple valid outputs

Mixture Density Networks: Modeling Multimodal Distributions

A foundational 1994 paper identifying why standard least-squares networks fail at inverse problems (multi-valued mappings). It introduces the Mixture Density Network (MDN), which predicts the parameters of a Gaussian Mixture Model to capture the full conditional probability density.

Computational Chemistry
Early optical recognition system converts scanned chemical diagrams to connection tables

Optical Recognition of Chemical Graphics

This paper describes an early prototype system that digitizes chemical structure diagrams from scanned documents. It employs a multi-stage pipeline involving convex bounding polygon extraction, vectorization, and rule-based heuristics to generate MDL Molfiles.

Computational Chemistry
Five-stage pipeline for reconstructing chemical molecules from raster images

Reconstruction of Chemical Molecules from Images

This methodological paper proposes a comprehensive pipeline to digitize chemical structure images. It achieves 97% reconstruction accuracy on benchmarks by combining a topology-preserving vectorizer with a chemical knowledge validation module.

Scientific Computing
Three-dimensional Brownian motion trajectory showing random walk behavior

Second-Order Langevin Equation for Field Simulations

Proposes the Hyperbolic Algorithm for Euclidean field theory simulations. By adding a second-order fictitious time derivative to the Langevin equation, the method reduces systematic errors from O(ε) down to O(ε²).

Computational Social Science
Hierarchical Ideal Point Topic Model visualization showing political polarization

Tea Party in the House: Legislative Ideology via HIPTM

This paper introduces the Hierarchical Ideal Point Topic Model (HIPTM) to analyze the 112th U.S. Congress. By jointly modeling votes and text, it uncovers how Tea Party Republicans and establishment Republicans differ in both voting records and how they frame specific policy issues.

Computational Chemistry
A cobalt sulfate and ethylenediamine mixture being prepared

Mixfile & MInChI: Machine-Readable Mixture Formats

A 2019 format specification introducing two complementary standards for chemical mixtures. Mixfile provides comprehensive mixture descriptions and MInChI provides compact canonical identifiers. This addresses the long-standing lack of standardized machine-readable formats for multi-component chemical systems.

Computational Chemistry
Optical chemical structure recognition example

MolRec: Rule-Based OCSR System

Details the MolRec system for converting chemical diagram images into MOL files using vectorization, geometric rules, and graph construction. Achieved 95% accuracy on 1000 TREC 2011 benchmark images with comprehensive failure analysis of limitations.

Computational Chemistry
ChemInfty: Chemical Structure Recognition in Patent Images

ChemInfty: Chemical Structure Recognition in Patent Images

A 2011 rule-based OCSR system designed specifically for the challenging low-quality images in Japanese patent applications, using segment-based methods to handle pervasive problems like touching characters, merged atom labels with bonds, and broken lines.

Machine Learning Fundamentals
Sphere packing illustration showing Shannon's geometric interpretation of channel capacity

Communication in the Presence of Noise: Shannon's 1949 Paper

Shannon’s foundational 1949 paper establishing the mathematical framework for modern information theory, defining channel capacity as the fundamental limit for reliable communication over noisy channels and introducing the sampling theorem (Nyquist-Shannon) that underpins all digital signal processing.

Computational Biology
Protein folding energy landscape funnel showing high-energy unfolded states converging to the native state

How to Fold Graciously: The Levinthal Paradox

Levinthal’s 1969 perspective paper defined the protein folding paradox by demonstrating the impossibility of random search, establishing the need for kinetic pathways that guide folding faster than thermodynamic equilibration allows.