Rectified Flow: Learning to Generate and Transfer Data

What kind of paper is this?

This is primarily a Method paper, with a significant Theory component.

Method: It proposes “Rectified Flow,” a novel generative framework that learns ordinary differential equations (ODEs) to transport distributions via straight paths. It introduces the “Reflow” algorithm to iteratively straighten these paths.
Theory: It provides rigorous proofs connecting the method to Optimal Transport, showing that the rectification process yields a coupling with non-increasing convex transport costs and that recursive reflow reduces the curvature of trajectories.

What is the motivation?

The work addresses two main challenges in unsupervised learning: generative modeling (generating data from noise) and domain transfer (mapping between two observed distributions).

Inefficiency of ODE/SDE Models: Continuous-time models (like Score-based Generative Models and DDPMs) require simulating diffusions over many steps, resulting in high computational costs during inference.
Complexity of GANs: GANs provide fast (one-step) generation alongside challenges with training instability and mode collapse.
Disconnection: Generative modeling and domain transfer are often treated as separate tasks requiring different techniques.

The authors aim to unify these tasks into a single “transport mapping” problem while bridging the gap between high-quality continuous models and fast one-step models.

What is the novelty here?

The core novelty is the Rectified Flow framework and the Reflow procedure.

Straight-Line ODEs: Rectified Flow learns an ODE drift $v$ to follow the straight line connecting data pairs $(X_0, X_1)$, providing an alternative to diffusion models that rely on stochastic paths or specific forward processes. This is achieved via a simple least-squares optimization problem.
Reflow (Iterative Straightening): The authors introduce a recursive training procedure where a new flow is trained on the data pairs $(Z_0, Z_1)$ generated by the previous flow. Theoretical analysis shows this reduces the “transport cost” and straightens the trajectories, allowing for accurate 1-step simulation (effectively converting the ODE into a one-step model).
Unified Framework: The method uses the exact same algorithm for generation ($\pi_0$ is Gaussian) and domain transfer ($\pi_0$ is a source dataset), removing the need for adversarial losses or cycle-consistency constraints.

What experiments were performed?

The authors validated the method across image generation, translation, and domain adaptation tasks.

Unconditioned Image Generation:
- Dataset: CIFAR-10 ($32\times32$).
- Baselines: Compared against GANs (StyleGAN2), Diffusion Models (DDPM, NCSN++), and distilled methods (DPM-Solver, Progressive Distillation).
- High-Res: Validated on LSUN Bedroom/Church, CelebA-HQ, and AFHQ ($256\times256$).
Image-to-Image Translation:
- Datasets: AFHQ (Cat $\leftrightarrow$ Dog/Wild), MetFace $\leftrightarrow$ CelebA-HQ.
- Setup: Transferring styles while preserving semantic identity (using a classifier-based feature mapping metric).
Domain Adaptation:
- Datasets: DomainNet, Office-Home.
- Metric: Classification accuracy on the transferred testing data.

What outcomes/conclusions?

Superior 1-Step Generation: On CIFAR-10 with a single Euler step (as of ICLR 2023), the “2-Rectified Flow” (after one reflow step) achieved an FID of 4.85, beating the diffusion distillation baseline TDPM (FID 8.91). The “3-Rectified Flow” reached a Recall of 0.51, beating the GAN baseline StyleGAN2+ADA (Recall 0.49).
Straightening Effect: The “Reflow” procedure was empirically shown to reduce the “straightness” error and transport costs, validating the theoretical claims. “Straightness” is measured as $S(Z) = \mathbb{E}[\int_0^1 |\dot{Z}_t - (Z_1 - Z_0)|^2, dt]$ (zero means perfectly straight); “transport cost” is $\mathbb{E}[c(Z_1 - Z_0)]$ for a convex cost $c$, and Reflow reduces this for all convex costs.
High-Quality Transfer: The model successfully performed image translation (e.g., Cat to Wild Animal) without paired data or cycle-consistency losses.
Fast Simulation: The method allows for extremely coarse time discretization (e.g., $N=1$) without significant quality loss after reflow, effectively solving the slow inference speed of standard ODE models.

Reproducibility Details

Data

The paper utilizes several standard computer vision benchmarks.

Purpose	Dataset	Size/Resolution	Notes
Generation	CIFAR-10	32x32	Standard split
Generation	LSUN (Bedroom, Church)	256x256	High-res evaluation
Generation	CelebA-HQ	256x256	High-res evaluation
Gen/Transfer	AFHQ (Cat, Dog, Wild)	512x512	Resized to 512x512
Transfer	MetFace	1024x1024	Resized
Adaptation	DomainNet	Mixed	345 categories, 6 domains
Adaptation	Office-Home	Mixed	65 categories, 4 domains

Algorithms

Objective Function: The drift $v(Z_t, t)$ is trained by minimizing a least-squares regression objective: $$\min_{v} \int_{0}^{1} \mathbb{E}[|(X_1 - X_0) - v(X_t, t)|^2] dt$$ where $X_t = tX_1 + (1-t)X_0$ is the linear interpolation.
Reflow Procedure: Iteratively updates the flow. Let $Z^k$ be the $k$-th rectified flow.
1. Generate data pairs $(Z_0^k, Z_1^k)$ by simulating the current flow.
2. Train $Z^{k+1}$ using these pairs as the new $(X_0, X_1)$ in the objective above.
ODE Solver:
- Training: Analytical linear interpolation.
- Inference: Euler method (constant step size $1/N$) or RK45 (adaptive).

Models

Architecture:
- CIFAR-10 / Transfer: Uses the DDPM++ U-Net architecture (based on Song et al., 2020b).
- High-Res Generation: Uses the NCSN++ architecture.
Optimization:
- Optimizer: Adam (CIFAR-10) or AdamW (Transfer/Adaptation).
- Hyperparameters:
  - LR: $2 \times 10^{-4}$ (CIFAR), Grid search for transfer.
  - EMA: 0.999999 (CIFAR), 0.9999 (Transfer).
  - Batch Size: 4 (Transfer).

Evaluation

Metric	Value (CIFAR-10, N=1)	Baseline (Best 1-step)	Notes
FID	4.85 (2-Rectified)	8.91 (TDPM)	Lower is better
Recall	0.51 (3-Rectified)	0.49 (StyleGAN2+ADA)	Higher is better

Hardware

The text omits explicit hardware specifics (e.g., GPU models). However, the architectures (NCSN++, DDPM++) typically require high-end GPU clusters (e.g., V100/A100s) for training on high-resolution datasets.

Paper Information

Citation: Liu, X., Gong, C., & Liu, Q. (2023). Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow. International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=XVjTT1nw5z

Publication: ICLR 2023

@inproceedings{liuFlowStraightFast2023,
  title = {Flow {{Straight}} and {{Fast}}: {{Learning}} to {{Generate}} and {{Transfer Data}} with {{Rectified Flow}}},
  booktitle = {International Conference on Learning Representations},
  author = {Liu, Xingchao and Gong, Chengyue and Liu, Qiang},
  year = 2023,
  url = {https://openreview.net/forum?id=XVjTT1nw5z}
}

Additional Resources:

What kind of paper is this?#

What is the motivation?#

What is the novelty here?#

What experiments were performed?#

What outcomes/conclusions?#

Reproducibility Details#

Data#

Algorithms#

Models#

Evaluation#

Hardware#

Paper Information#