Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

Dongjun Kim❋,♨1, Chieh-Hsin Lai❋,1,
Wei-Hsiang Liao1, Naoki Murata1, Yuhta Takida1, Toshimitsu Uesaka1, Yutong He♨1,3,
Yuki Mitsufuji1,2, Stefano Ermon4,
1Sony AI, 2Sony Group Corporation, 3Carnegie Mellon University, 4Stanford University
ICLR 2024

Equal Contribution (✉ Dongjun Kim; ✉ Chieh-Hsin Lai)

Internship at Sony AI

TL;DR

For single-step diffusion model sampling, our new model, Consistency Trajectory Model, achieves SOTA on CIFAR-10 (FID 1.73) and ImageNet 64x64 (FID 1.92). CTM offers diverse sampling options and balances computational budget with sample fidelity effectively.



MY ALT TEXT
MY ALT TEXT

Random generated samples by CTM, trained on ImageNet 64x64.

Abstract

Consistency Models (CM) (Song et al., 2023) accelerate score-based diffusion model sampling at the cost of sample quality but lack a natural way to trade-off quality for speed. To address this limitation, we propose Consistency Trajectory Model (CTM), a generalization encompassing CM and score-based models as special cases. CTM trains a single neural network that can -- in a single forward pass -- output scores (i.e., gradients of log-density) and enables unrestricted traversal between any initial and final time along the Probability Flow Ordinary Differential Equation (ODE) in a diffusion process. CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance and achieves new state-of-the-art FIDs for single-step diffusion model sampling on CIFAR-10 (FID 1.73) and ImageNet at 64X64 resolution (FID 1.92). CTM also enables a new family of sampling schemes, both deterministic and stochastic, involving long jumps along the ODE solution trajectories. It consistently improves sample quality as computational budgets increase, avoiding the degradation seen in CM. Furthermore, CTM's access to the score accommodates all diffusion model inference techniques, including exact likelihood computation.

Training and Sampling Comparison with CTM

MY ALT TEXT

Score-based models exhibit discretization errors during SDE/ODE solving, while distillation models can accumulate errors in multistep sampling. CTM mitigates these issues with γ-sampling (γ=0).

Training Stage

Sampling and Inference Stage

CTM's Novel γ-Sampling

Effect of γ-Sampling

CTM enables long ``jumps'' along the solution trajectory — allows our new sampling method, γ-sampling.

BibTeX

@article{kim2023consistency,
  title={Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion},
  author={Kim, Dongjun and Lai, Chieh-Hsin and Liao, Wei-Hsiang and Murata, Naoki and Takida, Yuhta and Uesaka, Toshimitsu and He, Yutong and Mitsufuji, Yuki and Ermon, Stefano},
  journal={arXiv preprint arXiv:2310.02279},
  year={2023}
}
Free-Counters