NeurIPS 2024		
Technion – Israel Institute of Technology		
A prominent family of methods for learning data distributions relies on density ratio estimation (DRE), where a model is trained to classify between data samples and samples from some reference distribution. DRE-based models can directly output the likelihood for any given input, a highly desired property that is lacking in most generative techniques. Nevertheless, to date, DRE methods have failed in accurately capturing the distributions of complex high-dimensional data, like images, and have thus been drawing reduced research attention in recent years. In this work we present classification diffusion models (CDMs), a DRE-based generative method that adopts the formalism of denoising diffusion models (DDMs) while making use of a classifier that predicts the level of noise added to a clean signal. Our method is based on an analytical connection that we derive between the MSE-optimal denoiser for removing white Gaussian noise and the cross-entropy-optimal classifier for predicting the noise level. Our method is the first DRE-based technique that can successfully generate images beyond the MNIST dataset. Furthermore, it can output the likelihood of any input in a single forward pass, achieving state-of-the-art negative log likelihood (NLL) among methods with this property.
DDMs are based on minimum-MSE (MMSE) denoising, while DRE methods hinge on optimal classification. In this work, we develop a connection between the optimal classifier for predicting the level of white Gaussian noise added to a data sample, and the MMSE denoiser for cleaning such noise. Specifically, we show that the latter can be obtained from the gradient of the former. Utilizing this connection, we propose classification diffusion model (CDM), a generative method that combines the formalism of DDMs, but instead of a denoiser, employs a noise-level classifier. CDM is the first instance of a DRE-based method that can successfully generate images beyond MNIST. In addition, as a DRE method, CDM is inherently capable of outputting the exact log-likelihood in a single NFE. In fact, it achieves state-of-the-art negative-log-likelihood (NLL) results among methods that use a single NFE, and comparable results to computationally-expensive ODE-based methods.
Negative Log Likelihood |
||
Model | NLL ↓ | NFE |
---|---|---|
iResNet | 3.45 | 100 |
FFJORD | 3.40 | ~3K |
MintNet | 3.32 | 120 |
FlowMatching | 2.99 | 142 |
VDM | 2.65 | 10K |
DDPM ($L$) | ≤3.70 | 1K |
DDPM ($L_{simple}$) | ≤3.75 | 1K |
DDPM (SDE) | 3.28 | ~200 |
DDPM++ cont. | 2.99 | ~200 |
RealNVP | 3.49 | 1 |
Glow | 3.35 | 1 |
Residual Flow | 3.28 | 1 |
CDM | 3.38 | 1 |
CDM(unif.) | 2.98 | 1 |
CDM(OT) | 2.89 | 1 |
Fréchet Inception Distance |
||
Sampling Method | Model | |
---|---|---|
CelebA $64\times64$ | ||
DDM | CDM | |
DDIM Sampler, 50 steps | 8.47 | 4.78 |
DDPM Sampler, 1000 steps | 4.13 | 2.51 |
2nd order DPM Solver, 25 steps | 6.16 | 4.45 |
Unconditional CIFAR-10 $32\times32$ | ||
DDM | CDM | |
DDIM Sampler, 50 steps | 7.19 | 7.56 |
DDPM Sampler, 1000 steps | 4.77 | 4.74 |
2nd order DPM Solver, 25 steps | 6.91 | 7.29 |
Conditional CIFAR-10 $32\times32$ | ||
DDM | CDM | |
DDIM Sampler, 50 steps | 5.92 | 5.08 |
DDPM Sampler, 1000 steps | 4.70 | 3.66 |
2nd order DPM Solver, 25 steps | 5.87 | 4.87 |
Bibtex
This webpage was originally made by Matan Kleiner with the
help of Hila Manor
for SinDDM and can be used as a template.
It is inspired by the template that was originally made by Phillip Isola and
Richard Zhang for a colorful ECCV project;
the code for the original template can be found here.