Classifier Free Guidance
We show that guidance can be indeed performed by a pure generative model without such a classifier: in what we call classifier-free guidance, we jointly train a conditional and an unconditional diffusion model, and we combine the resulting conditional and unconditional score estimates to attain a trade-off between sample quality and diversity similar to that obtained using classifier guidance.
Classifier guidance (CG) drawbacks:
- need a classifier on noisy image
- Furthermore, because classifier guidance mixes a score estimate with a classifier gradient during sampling, classifier-guided diffusion sampling can be interpreted as attempting to confuse an image classifier with a gradient-based adversarial attack. This raises the question of whether classifier guidance is successful at boosting classifier-based metrics such as FID and Inception score (IS) simply because it is adversarial against such classifiers.
- nonparameteric generators resembles GANs
Solution: discard classifier
classifier-free guidance instead mixes the score estimates of a conditional diffusion model and a jointly trained unconditional diffusion model. \[ p_{\theta, \phi}\left(x_t \mid x_{t+1}, y\right)=Z \cdot p_\theta\left(x_t \mid x_{t+1}\right) \cdot p_\phi\left(y \mid x_t\right) \] For DDIM: \[ \begin{aligned} \nabla_{x_t} \log{p_{\theta, \phi}\left(x_t \mid x_{t+1}, y\right)} &\approx \nabla_{x_t} \log p_\theta(x_t \mid x_{t+1}) + \nabla_{x_t} \log p_{\phi}(y \mid x_t)\\ &= -\frac{1}{\sqrt{1-\bar{\alpha}_t}} \epsilon_\theta(x_t) + \nabla_{x_t} \log p_{\phi}(y \mid x_t)\\ &= -\epsilon_\theta(x_t) + \sqrt{1-\bar{\alpha}_t}\nabla_{x_t} \log p_{\phi}(y \mid x_t) \end{aligned} \] But now we don't use classifier, so: \[ \begin{aligned} \nabla_{\mathbf{x}_t} \log p \left(y \mid \mathbf{x}_t\right) & =\nabla_{\mathbf{x}_t} \log \frac{p(\mathbf{x}_t \mid y) p(y)}{p(\mathbf{x}_t) } \\ & =\nabla_{\mathbf{x}_t} \log p\left(\mathbf{x}_t \mid y\right)-\nabla_{\mathbf{x}_t} \log p\left(\mathbf{x}_t\right) \\ & =-\frac{1}{\sqrt{1-\bar{\alpha}_t}}\left(\boldsymbol{\epsilon}_\theta(\mathbf{x}_t, y)-\boldsymbol{\epsilon}_\theta(\mathbf{x}_t)\right) \end{aligned} \] Therefore, for Classifier Free Guidance: \[ \nabla_{x_t} \log{p_{\theta, \phi}\left(x_t \mid x_{t+1}, y\right)} \approx -\epsilon_\theta(x_t) - cfg\big[ \epsilon_{\theta}(x_t,y) - \epsilon_{\theta}(x_t) \big] \] For Stable Diffusion: \[ x_{c f g}=x_{n e g}+ \textit{CFG_Value}\left(x_{p o s}-x_{n e g}\right) \]