Theory Papers: Formal Analysis, Proofs, and Derivations

Pipeline showing atoms converted to smooth density, symmetrized via Haar integration, and projected to invariant features

Atom-Density Representations for Machine Learning

Introduces a Dirac notation formalism for atomic environments that unifies SOAP power spectra, Behler-Parrinello symmetry functions, and other density-based structural representations under a single theoretical framework.

Machine Learning

Three-panel diagram showing symmetry group decomposition, equivariant mapping from world states to representations, and block-diagonal disentangled decomposition

Defining Disentangled Representations via Group Theory

Proposes the first principled mathematical definition of disentangled representations by connecting symmetry group decompositions to independent subspaces in a representation’s vector space.

Machine Learning

Three-panel diagram showing an original sequence, its time-warped version, and the gate values derived from requiring time warping invariance

Can Recurrent Neural Networks Warp Time? (ICLR 2018)

Tallec and Ollivier show that requiring invariance to time transformations in recurrent models leads to gating mechanisms, recovering key LSTM components from first principles. They propose the chrono initialization for gate biases that improves learning of long-term dependencies.

Machine Learning

The three quarks of attention: multiplexing (additive), output gating (multiplicative output), and synaptic gating (multiplicative weight)

The Quarks of Attention: Building Blocks of Attention

Baldi and Vershynin systematically classify the fundamental building blocks of attention (activation attention, output gating, synaptic gating) by source, target, and mechanism, then prove capacity bounds showing that gating introduces quadratic terms sparsely, gaining expressiveness without the full cost of polynomial activations.

Machine Learning

Comparison of linear interpolation (teleportation) showing double peaks versus displacement interpolation (transportation) showing smooth single peak

A Convexity Principle for Interacting Gases (McCann 1997)

A theoretical paper that introduces displacement interpolation (optimal transport) to establish a new convexity principle for energy functionals. It proves the uniqueness of ground states for interacting gases and generalizes the Brunn-Minkowski inequality, providing mathematical tools later used in flow matching and optimal transport-based generative models.

Generative Modeling

Denoising Score Matching Intuition - Vectors point from corrupted samples back to clean data, approximating the score

Score Matching and Denoising Autoencoders: A Connection

This paper provides a rigorous probabilistic foundation for Denoising Autoencoders by proving they are mathematically equivalent to Score Matching on a kernel-smoothed data distribution. It derives a specific energy function for DAEs and justifies the use of tied weights.

Machine Learning

Diagram showing distributed representations with three pools of units (AGENT, RELATIONSHIP, PATIENT) connected via role/identity bindings

Distributed Representations: A Foundational Theory

Geoffrey Hinton’s 1984 technical report that formally derives the efficiency of distributed representations (coarse coding) and demonstrates their properties of automatic generalization, content-addressability, and robustness to damage.

Planetary Science

Abstract artistic representation of alkaline hydrothermal vents with spiraling geological formations

Drive to Life on Wet and Icy Worlds: Alkaline Vent Theory

This paper reformulates the submarine alkaline hydrothermal theory for the origin of life, positing that life emerged as a free energy converter driven by specific geological disequilibria - specifically redox and pH gradients across inorganic precipitate membranes - utilizing hydrogen, methane, and CO2 as primary feedstocks.

Computational Biology

Four types of protein folding energy landscapes from left to right: smooth funnel, rugged funnel with kinetic traps, moat funnel, and champagne glass funnel

Funnels, Pathways, and Energy Landscapes of Protein Folding

This paper resolves Levinthal’s paradox by replacing the single-pathway view with a statistical energy landscape approach. It introduces the concepts of the folding funnel, the glass transition in proteins, and the ‘stability gap’ as a design principle for foldable sequences.

Molecular Simulation

Carbon monoxide molecule adsorbed on Pt(100) FCC surface in hollow site configuration

Kinetic Oscillations in CO Oxidation on Pt(100): Theory

Imbihl et al. establish the first detailed microscopic model for CO oxidation oscillations on Pt(100), identifying the adsorbate-induced hex to 1x1 phase transition as the driving force. The study combines linear stability analysis with numerical reaction-diffusion simulations.

Planetary Science

Conceptual cross-section of the Cloud Continent proposal showing three layers: the CO2 atmosphere below, the nitrogen-filled honeycomb structure at 50 km altitude, and the habitable atmosphere above

Terraforming Venus With the Cloud Continent Proposal

A speculative 2022 engineering proposal for terraforming Venus by constructing a nitrogen-filled honeycomb structure floating at 50 km altitude where temperature and pressure are Earth-like, avoiding the need to remove Venus’s massive atmosphere while using CO2 electrolysis to produce breathable oxygen and carbon nanostructures for construction.

Machine Learning

Sphere packing illustration showing Shannon's geometric interpretation of channel capacity

Communication in the Presence of Noise: Shannon's 1949 Paper

Shannon’s foundational 1949 paper establishing the mathematical framework for modern information theory, defining channel capacity as the fundamental limit for reliable communication over noisy channels and introducing the sampling theorem (Nyquist-Shannon) that underpins all digital signal processing.