Generative Models

This section covers the core families of generative models used in modern machine learning. Notes begin with the foundational variational autoencoder (VAE) and its extensions (importance-weighted objectives, contrastive priors), then move through continuous normalizing flows, neural ODEs, score-based and diffusion models, and flow matching. The thread connecting these works is the shared goal of learning to sample from complex distributions, and each set of notes tries to make the mathematical connections between approaches explicit rather than treating them as isolated methods.

Machine Learning Fundamentals

Comparison of linear interpolation (teleportation) showing double peaks versus displacement interpolation (transportation) showing smooth single peak

A Convexity Principle for Interacting Gases (McCann 1997)

A foundational theoretical paper that introduces displacement interpolation (optimal transport) to establish a new convexity principle for energy functionals. It proves the uniqueness of ground states for interacting gases and generalizes the Brunn-Minkowski inequality, providing mathematical foundations used in modern generative models.

Generative Modeling

Visualization of probability density flow from initial distribution ρ₀ to target distribution ρ₁ over time through space

Building Normalizing Flows with Stochastic Interpolants

Proposes ‘InterFlow’, a method to learn continuous normalizing flows between arbitrary densities using stochastic interpolants. It avoids ODE backpropagation by minimizing a quadratic objective on the velocity field, enabling scalable ODE-based generation. On CIFAR-10, NLL matches ScoreSDE (2.99 nats) with simulation-free training, though FID (10.27) trails dedicated image models (ScoreSDE: 2.92); the primary strength is tractable likelihood with efficient training cost.

Generative Modeling

Visualization comparing Optimal Transport (straight paths) vs Diffusion (curved paths) for Flow Matching

Flow Matching for Generative Modeling: Scalable CNFs

Introduces Flow Matching, a scalable method for training CNFs by regressing vector fields of conditional probability paths. It generalizes diffusion and enables Optimal Transport paths for straighter, more efficient sampling.

Machine Learning Fundamentals

Comparison of Residual Network vs ODE Network architectures showing discrete layers versus continuous transformations

Neural ODEs: Continuous-Depth Deep Learning

This paper replaces discrete network layers with continuous ordinary differential equations (ODEs), allowing for adaptive computation depth and constant memory cost during training via the adjoint sensitivity method. It introduces Continuous Normalizing Flows and latent ODEs for time-series.

Generative Modeling

Visualization showing linear interpolation, learned ODE trajectories, and the reflow straightening process for rectified flow

Rectified Flow: Learning to Generate and Transfer Data

Introduces ‘Rectified Flow,’ a method to transport distributions via ODEs with straight paths. Uses a ‘reflow’ procedure to iteratively straighten trajectories, enabling high-quality 1-step generation without complex distillation pipelines.

Generative Modeling

Denoising Score Matching Intuition - Vectors point from corrupted samples back to clean data, approximating the score

Score Matching and Denoising Autoencoders

This paper provides a rigorous probabilistic foundation for Denoising Autoencoders by proving they are mathematically equivalent to Score Matching on a kernel-smoothed data distribution. It derives a specific energy function for DAEs and justifies the use of tied weights.

Generative Modeling

Forward and Reverse SDE trajectories showing the diffusion process from data to noise and back

Score-Based Generative Modeling with SDEs

This paper unifies previous score-based methods (SMLD and DDPM) under a continuous-time SDE framework. It introduces Predictor-Corrector samplers for improved generation and Probability Flow ODEs for near-exact likelihood computation, setting new records on CIFAR-10.

Machine Learning Fundamentals

Visualization of inverse problem showing one input mapping to multiple valid outputs

Mixture Density Networks: Modeling Multimodal Distributions

A foundational 1994 paper identifying why standard least-squares networks fail at inverse problems (multi-valued mappings). It introduces the Mixture Density Network (MDN), which predicts the parameters of a Gaussian Mixture Model to capture the full conditional probability density.

Generative Modeling

Diagram comparing standard stochastic sampling (gradient blocked) vs the reparameterization trick (gradient flows)

Auto-Encoding Variational Bayes: VAE Paper Summary

Kingma and Welling’s foundational 2013 paper introducing Variational Autoencoders and the reparameterization trick, enabling end-to-end gradient-based training of generative models with continuous latent variables by moving the stochasticity outside the computational graph so that gradients can flow through a deterministic path.

Generative Modeling

Flowchart comparing VAE and IWAE computation showing the key difference in where averaging occurs relative to the log operation

IWAE: Importance Weighted Autoencoders

Burda et al.’s ICLR 2016 paper introducing Importance Weighted Autoencoders, which use importance sampling to derive a strictly tighter log-likelihood lower bound than standard VAEs, addressing posterior collapse and improving generative quality. The model architecture remains the same.

Generative Modeling

Visualization of the VAE prior hole problem showing a ring-shaped aggregate posterior with an empty center where the Gaussian prior has highest density

Contrastive Learning for Variational Autoencoder Priors

A NeurIPS 2021 method paper introducing Noise Contrastive Priors to address the VAE ‘prior hole’ problem, where standard Gaussian priors assign high density to regions of latent space that don’t correspond to realistic data, using energy-based models trained with contrastive learning to match the aggregate posterior.