Document Processing
Conceptual diagram of page stream segmentation sorting pages into documents

The Evolution of Page Stream Segmentation: Rules to LLMs

We trace the history of Page Stream Segmentation (PSS) through three eras (Heuristic, Encoder, and Decoder) and explain how privacy-preserving, localized LLMs enable true semantic processing.

Computational Social Science
NOMINATE spatial plot showing Senate vote on Balanced Budget Amendment (1995) with legislators positioned on liberal-conservative dimension

A Spatial Model for Legislative Roll Call Analysis

This paper introduces NOMINATE, a probabilistic spatial model that recovers metric coordinates for legislators and roll calls from nominal voting data, demonstrating that a single liberal-conservative dimension explains the vast majority of Congressional voting behavior.

Planetary Science
Orbital diagram showing chaotic planetary trajectories

Chaotic Evolution of the Solar System (Sussman 1992)

Sussman and Wisdom’s 1992 study used the Supercomputer Toolkit and symplectic mapping to integrate the entire Solar System for 100 million years, confirming chaotic behavior with an exponential divergence timescale of ~4 million years and demonstrating that long-term planetary motion is fundamentally unpredictable.

Machine Learning Fundamentals
Diagram showing distributed representations with three pools of units (AGENT, RELATIONSHIP, PATIENT) connected via role/identity bindings

Distributed Representations: A Foundational Theory

Geoffrey Hinton’s 1984 technical report that formally derives the efficiency of distributed representations (coarse coding) and demonstrates their properties of automatic generalization, content-addressability, and robustness to damage.

Computational Chemistry
Graph Perception for Chemical Structure OCR

Graph Perception for Chemical Structure OCR

This 1990 paper presents an early OCR pipeline for converting hand-drawn or printed chemical structures into connectivity tables. It introduces novel sweeping algorithms for graph perception and a matrix-based feature extraction method for character recognition.

Scientific Computing
Three-dimensional Brownian motion trajectory showing random walk behavior

Second-Order Langevin Equation for Field Simulations

Proposes the Hyperbolic Algorithm for Euclidean field theory simulations. By adding a second-order fictitious time derivative to the Langevin equation, the method reduces systematic errors from O(ε) down to O(ε²).

Evolutionary Biology
Electron microscope image of Pyrolobus fumarii showing irregular coccoid cell structure

Three Domains of Life: Woese's Phylogenetic Revolution

This paper established the three-domain classification system (Bacteria, Archaea, Eucarya) based on molecular evidence from ribosomal RNA sequences, arguing that the prokaryote-eukaryote dichotomy obscures the deep evolutionary divergence of Archaea from Bacteria.

Computational Chemistry
Delayed convolution approximation for distinct Van Hove function showing comparison between simulated data and theoretical model

Correlations in the Motion of Atoms in Liquid Argon

This work validated classical Molecular Dynamics for simulating liquids, revealing the ‘cage effect’ in velocity autocorrelation and establishing predictor-corrector integration algorithms for N-body problems.

Planetary Science
Venus as seen by Mariner 10, showing swirling cloud patterns in the dense atmosphere

Life on Venus? Astrobiology and the Habitability Limits

A deep dive into the physical limits of life on Venus, reviewing Charles Cockell’s foundational 1999 analysis while connecting it to modern discoveries like the 2020 phosphine detection and upcoming DAVINCI+ missions.

Computational Chemistry
SELFIES molecular representation overview

SELFIES: The Original Paper on Robust Molecular Strings

The 2020 paper that introduced SELFIES: Mario Krenn and colleagues created a molecular representation that solves SMILES validity problems. It guarantees every generated string corresponds to a valid chemical structure.

Computational Chemistry
Benzene molecular structure diagram

SMILES Notation: The Original Paper by Weininger (1988)

David Weininger introduced SMILES notation in 1988, establishing encoding rules for representing chemical structures as compact, human-readable strings.

Machine Learning Fundamentals
Sphere packing illustration showing Shannon's geometric interpretation of channel capacity

Communication in the Presence of Noise: Shannon's 1949 Paper

Shannon’s foundational 1949 paper establishing the mathematical framework for modern information theory, defining channel capacity as the fundamental limit for reliable communication over noisy channels and introducing the sampling theorem (Nyquist-Shannon) that underpins all digital signal processing.