A step-by-step, visual explainer of forward noise and reverse denoising in modern diffusion models.
Link copied!
Diffusion Models
Diffusion models were introduced as generative probabilistic frameworks that gradually add and then remove noise. By learning to reverse the diffusion, they synthesize new samples that follow the data distribution.
t = 499βclean sample
Forward Process
The forward process applies Gaussian noise over a schedule of discrete time steps:
q(xtββ£xtβ1β)=N(xtβ;1βΞ²tββxtβ1β,Ξ²tβI).
Under the hood, a neural network predicts the noise at each timestep. The animation below shows a dense network with signals pulsing from input to output.
Self-attention over pixels
Modern diffusion U-Nets often mix convolutions with self-attention so distant pixels can influence each other. Click any pixel below to make it the query; the white lines show attention weights fading with distance, controlled by the Ο slider.
Click any pixel to make it the query.Higher Ο β flatter attention.
U-Net backbone
Diffusion models commonly use a U-Net: an encoder that downsamples to a bottleneck, then a decoder that upsamples while fusing skip connections. Slide to see the forward (left) and reverse (right) halves light up.
Left: downsampling encoder.Right: upsampling decoder with skip links.
Training dynamics
Diffusion models are optimized with gradient descent. The plot below shows steps on a simple quadratic: small learning rates creep toward the minimum; larger rates move faster but can overshoot.
Small lr β slow but stable.Large lr β overshoots past the minimum.
Single gate intuition
A neural network is built from simple gates. Here is one ReLU-style gate with two inputs and one outputβsignals flow left to right in monochrome.
A single RELU gate: signals enter from the left, combine in the gate, and produce one output on the right.