CTM

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

Dongjun Kim^❋,^♨1, Chieh-Hsin Lai^❋,¹,
Wei-Hsiang Liao¹, Naoki Murata¹, Yuhta Takida¹, Toshimitsu Uesaka¹, Yutong He^♨1,3,
Yuki Mitsufuji^1,2, Stefano Ermon⁴,

¹Sony AI, ²Sony Group Corporation, ³Carnegie Mellon University, ⁴Stanford University
ICLR 2024
^❋Equal Contribution (✉ Dongjun Kim; ✉ Chieh-Hsin Lai)
^♨Internship at Sony AI

TL;DR

For single-step diffusion model sampling, our new model, Consistency Trajectory Model, achieves SOTA on CIFAR-10 (FID 1.73) and ImageNet 64x64 (FID 1.92). CTM offers diverse sampling options and balances computational budget with sample fidelity effectively.

Abstract

Consistency Models (CM) (Song et al., 2023) accelerate score-based diffusion model sampling at the cost of sample quality but lack a natural way to trade-off quality for speed. To address this limitation, we propose Consistency Trajectory Model (CTM), a generalization encompassing CM and score-based models as special cases. CTM trains a single neural network that can -- in a single forward pass -- output scores (i.e., gradients of log-density) and enables unrestricted traversal between any initial and final time along the Probability Flow Ordinary Differential Equation (ODE) in a diffusion process. CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance and achieves new state-of-the-art FIDs for single-step diffusion model sampling on CIFAR-10 (FID 1.73) and ImageNet at 64X64 resolution (FID 1.92). CTM also enables a new family of sampling schemes, both deterministic and stochastic, involving long jumps along the ODE solution trajectories. It consistently improves sample quality as computational budgets increase, avoiding the degradation seen in CM. Furthermore, CTM's access to the score accommodates all diffusion model inference techniques, including exact likelihood computation.

Training Stage

CTM encompasses both Consistency Model (CM) (Song et al., 2023) and Score-based Diffusion Models as special cases.

CTM estimates both infinitesimal steps (score function/slope) and long steps (integral over any time horizon/any intermediate point) of the Probability Flow (PF) ODE from any initial condition.

CTM's novel parametrization (upper triangle) provides access to both CM (s=0 line) and score-based model (t=s line), where t is an initial time and s is an end time.

Sampling and Inference Stage

CTM enables both long ``jumps'' along the solution trajectory and score evaluation!

CTM enables unrestricted traversal between any initial and final time along the solution trajectory — allows our new sampling method, γ-sampling.

CTM enables score evaluation — allows score-based sampling (via solving SDEs or ODEs) and exact likelihood computation.

CTM's Novel γ-Sampling

Overview of CTM's γ-sampling.

γ=1: CM's multistep sampling. Fully stochastic.

1>γ>0: CTM's unique sampling method with γ controls variance (sample variances ∝ γ²).

γ=0: CTM's unique deterministic sampling method preserves semantic. It avoids discretization errors of ODE solvers.

CTM enables long ``jumps'' along the solution trajectory — allows our new sampling method, γ-sampling.

Applying Heun's method with CTM's score generates comparable samples to EDM (Karras et al., 2022).

CTM's deterministic sampling matches Heun's solver as NFE increases. In contrast, CM's multistep sampler significantly degrades in quality with higher NFE. CTM has a clear trade-off between sample quality and speed.

1>γ>0: CTM's unique sampling method. Sample variances ∝ γ².

γ=0: CTM's unique deterministic sampling method preserves semantic.

BibTeX

@article{kim2023consistency, title={Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion}, author={Kim, Dongjun and Lai, Chieh-Hsin and Liao, Wei-Hsiang and Murata, Naoki and Takida, Yuhta and Uesaka, Toshimitsu and He, Yutong and Mitsufuji, Yuki and Ermon, Stefano}, journal={arXiv preprint arXiv:2310.02279}, year={2023} }

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

TL;DR

Random generated samples by CTM, trained on ImageNet 64x64.

Abstract

Comparison of samples: EDM (FID 2.44) with NFE 79; CTM (FID 1.92) with NFE 1; CTM (FID 1.79) with NFE 2; CM (FID 6.20) with NFE 1.

Comparison of numerical results with baselines: CTM achieves SOTAs in both likelihood computation (NLL) and few-step diffusion model generation.

Training and Sampling Comparison with CTM

Score-based models exhibit discretization errors during SDE/ODE solving, while distillation models can accumulate errors in multistep sampling. CTM mitigates these issues with γ-sampling (γ=0).

Training Stage

CTM encompasses both Consistency Model (CM) (Song et al., 2023) and Score-based Diffusion Models as special cases.

CTM estimates both infinitesimal steps (score function/slope) and long steps (integral over any time horizon/any intermediate point) of the Probability Flow (PF) ODE from any initial condition.

CTM's novel parametrization (upper triangle) provides access to both CM (s=0 line) and score-based model (t=s line), where t is an initial time and s is an end time.

Sampling and Inference Stage

CTM enables both long ``jumps'' along the solution trajectory and score evaluation!

CTM enables unrestricted traversal between any initial and final time along the solution trajectory — allows our new sampling method, γ-sampling.

CTM enables score evaluation — allows score-based sampling (via solving SDEs or ODEs) and exact likelihood computation.

CTM's Novel γ-Sampling

Overview of CTM's γ-sampling.

γ=1: CM's multistep sampling. Fully stochastic.

1>γ>0: CTM's unique sampling method with γ controls variance (sample variances ∝ γ²).

γ=0: CTM's unique deterministic sampling method preserves semantic. It avoids discretization errors of ODE solvers.

Effect of γ-Sampling

CTM enables long ``jumps'' along the solution trajectory — allows our new sampling method, γ-sampling.

Applying Heun's method with CTM's score generates comparable samples to EDM (Karras et al., 2022).

CTM's deterministic sampling matches Heun's solver as NFE increases. In contrast, CM's multistep sampler significantly degrades in quality with higher NFE. CTM has a clear trade-off between sample quality and speed.

1>γ>0: CTM's unique sampling method. Sample variances ∝ γ².

γ=0: CTM's unique deterministic sampling method preserves semantic.

BibTeX

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

TL;DR

Random generated samples by CTM, trained on ImageNet 64x64.

Abstract

Comparison of samples: EDM (FID 2.44) with NFE 79; CTM (FID 1.92) with NFE 1; CTM (FID 1.79) with NFE 2; CM (FID 6.20) with NFE 1.

Comparison of numerical results with baselines: CTM achieves SOTAs in both likelihood computation (NLL) and few-step diffusion model generation.

Training and Sampling Comparison with CTM

Score-based models exhibit discretization errors during SDE/ODE solving, while distillation models can accumulate errors in multistep sampling. CTM mitigates these issues with γ-sampling (γ=0).

Training Stage

CTM encompasses both Consistency Model (CM) (Song et al., 2023) and Score-based Diffusion Models as special cases.

CTM estimates both infinitesimal steps (score function/slope) and long steps (integral over any time horizon/any intermediate point) of the Probability Flow (PF) ODE from any initial condition.

CTM's novel parametrization (upper triangle) provides access to both CM (s=0 line) and score-based model (t=s line), where t is an initial time and s is an end time.

Sampling and Inference Stage

CTM enables both long ``jumps'' along the solution trajectory and score evaluation!

CTM enables unrestricted traversal between any initial and final time along the solution trajectory — allows our new sampling method, γ-sampling.

CTM enables score evaluation — allows score-based sampling (via solving SDEs or ODEs) and exact likelihood computation.

CTM's Novel γ-Sampling

Overview of CTM's γ-sampling.

γ=1: CM's multistep sampling. Fully stochastic.

1>γ>0: CTM's unique sampling method with γ controls variance (sample variances ∝ γ2).

γ=0: CTM's unique deterministic sampling method preserves semantic. It avoids discretization errors of ODE solvers.

Effect of γ-Sampling

CTM enables long ``jumps'' along the solution trajectory — allows our new sampling method, γ-sampling.

Applying Heun's method with CTM's score generates comparable samples to EDM (Karras et al., 2022).

CTM's deterministic sampling matches Heun's solver as NFE increases. In contrast, CM's multistep sampler significantly degrades in quality with higher NFE. CTM has a clear trade-off between sample quality and speed.

1>γ>0: CTM's unique sampling method. Sample variances ∝ γ2.

γ=0: CTM's unique deterministic sampling method preserves semantic.

BibTeX

1>γ>0: CTM's unique sampling method with γ controls variance (sample variances ∝ γ²).

1>γ>0: CTM's unique sampling method. Sample variances ∝ γ².