Paper Information

Citation: Liu, X., Gong, C., & Liu, Q. (2023). Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow. International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=XVjTT1nw5z

Publication: ICLR 2023

Additional Resources:

What kind of paper is this?

This is primarily a Method paper, with a significant Theory component.

  • Method: It proposes “Rectified Flow,” a novel generative framework that learns ordinary differential equations (ODEs) to transport distributions via straight paths. It introduces the “Reflow” algorithm to iteratively straighten these paths.
  • Theory: It provides rigorous proofs connecting the method to Optimal Transport, showing that the rectification process yields a coupling with non-increasing convex transport costs and that recursive reflow reduces the curvature of trajectories.

What is the motivation?

The work addresses two main challenges in unsupervised learning: generative modeling (generating data from noise) and domain transfer (mapping between two observed distributions).

  • Inefficiency of ODE/SDE Models: Continuous-time models (like Score-based Generative Models and DDPMs) require simulating diffusions over many steps, resulting in high computational costs during inference.
  • Complexity of GANs: While fast (one-step), GANs suffer from training instability and mode collapse.
  • Disconnection: Generative modeling and domain transfer are often treated as separate tasks requiring different techniques.

The authors aim to unify these tasks into a single “transport mapping” problem while bridging the gap between high-quality continuous models and fast one-step models.

What is the novelty here?

The core novelty is the Rectified Flow framework and the Reflow procedure.

  • Straight-Line ODEs: Unlike diffusion models that rely on stochastic paths or specific forward processes, Rectified Flow learns an ODE drift $v$ to follow the straight line connecting data pairs $(X_0, X_1)$. This is achieved via a simple least-squares optimization problem.
  • Reflow (Iterative Straightening): The authors introduce a recursive training procedure where a new flow is trained on the data pairs $(Z_0, Z_1)$ generated by the previous flow. Theoretical analysis shows this reduces the “transport cost” and straightens the trajectories, allowing for accurate 1-step simulation (effectively converting the ODE into a one-step model).
  • Unified Framework: The method uses the exact same algorithm for generation ($\pi_0$ is Gaussian) and domain transfer ($\pi_0$ is a source dataset), removing the need for adversarial losses or cycle-consistency constraints.

What experiments were performed?

The authors validated the method across image generation, translation, and domain adaptation tasks.

  • Unconditioned Image Generation:
    • Dataset: CIFAR-10 ($32\times32$).
    • Baselines: Compared against GANs (StyleGAN2), Diffusion Models (DDPM, NCSN++), and distilled methods (DPM-Solver, Progressive Distillation).
    • High-Res: Validated on LSUN Bedroom/Church, CelebA-HQ, and AFHQ ($256\times256$).
  • Image-to-Image Translation:
    • Datasets: AFHQ (Cat $\leftrightarrow$ Dog/Wild), MetFace $\leftrightarrow$ CelebA-HQ.
    • Setup: Transferring styles while preserving semantic identity (using a classifier-based feature mapping metric).
  • Domain Adaptation:
    • Datasets: DomainNet, Office-Home.
    • Metric: Classification accuracy on the transferred testing data.

What were the outcomes and conclusions drawn?

  • Superior 1-Step Generation: On CIFAR-10, the “2-Rectified Flow” (after one reflow step) achieved an FID of 4.85 and Recall of 0.51 with a single Euler step, outperforming previous state-of-the-art one-step diffusion distillation methods.
  • Straightening Effect: The “Reflow” procedure was empirically shown to reduce the “straightness” error and transport costs, validating the theoretical claims.
  • High-Quality Transfer: The model successfully performed image translation (e.g., Cat to Wild Animal) without paired data or cycle-consistency losses.
  • Fast Simulation: The method allows for extremely coarse time discretization (e.g., $N=1$) without significant quality loss after reflow, effectively solving the slow inference speed of standard ODE models.

Reproducibility Details

Data

The paper utilizes several standard computer vision benchmarks.

PurposeDatasetSize/ResolutionNotes
GenerationCIFAR-1032x32Standard split
GenerationLSUN (Bedroom, Church)256x256High-res evaluation
GenerationCelebA-HQ256x256High-res evaluation
Gen/TransferAFHQ (Cat, Dog, Wild)512x512Resized to 512x512
TransferMetFace1024x1024Resized
AdaptationDomainNetMixed345 categories, 6 domains
AdaptationOffice-HomeMixed65 categories, 4 domains

Algorithms

  • Objective Function: The drift $v(Z_t, t)$ is trained by minimizing a least-squares regression objective: $$\min_{v} \int_{0}^{1} \mathbb{E}[||(X_1 - X_0) - v(X_t, t)||^2] dt$$ where $X_t = tX_1 + (1-t)X_0$ is the linear interpolation.

  • Reflow Procedure: Iteratively updates the flow. Let $Z^k$ be the $k$-th rectified flow.

    1. Generate data pairs $(Z_0^k, Z_1^k)$ by simulating the current flow.
    2. Train $Z^{k+1}$ using these pairs as the new $(X_0, X_1)$ in the objective above.
  • ODE Solver:

    • Training: Analytical linear interpolation.
    • Inference: Euler method (constant step size $1/N$) or RK45 (adaptive).

Models

  • Architecture:

    • CIFAR-10 / Transfer: Uses the DDPM++ U-Net architecture (based on Song et al., 2020b).
    • High-Res Generation: Uses the NCSN++ architecture.
  • Optimization:

    • Optimizer: Adam (CIFAR-10) or AdamW (Transfer/Adaptation).
    • Hyperparameters:
      • LR: $2 \times 10^{-4}$ (CIFAR), Grid search for transfer.
      • EMA: 0.999999 (CIFAR), 0.9999 (Transfer).
      • Batch Size: 4 (Transfer).

Evaluation

MetricValue (CIFAR-10, N=1)Baseline (Best 1-step)Notes
FID4.85 (2-Rectified)8.91 (TDPM)Lower is better
Recall0.51 (3-Rectified)0.49 (StyleGAN2+ADA)Higher is better

Hardware

Hardware specifics (e.g., GPU models) are not explicitly detailed in the text, but the architectures (NCSN++, DDPM++) typically require high-end GPU clusters (e.g., V100/A100s) for training on high-resolution datasets.


Citation

@inproceedings{liuFlowStraightFast2023,
  title = {Flow {{Straight}} and {{Fast}}: {{Learning}} to {{Generate}} and {{Transfer Data}} with {{Rectified Flow}}},
  booktitle = {International Conference on Learning Representations},
  author = {Liu, Xingchao and Gong, Chengyue and Liu, Qiang},
  year = 2023,
  url = {https://openreview.net/forum?id=XVjTT1nw5z}
}