Diffusion Models: Why Are They Important?
Diffusion models represent that zenith of generative capabilities today. However, these models stand on the shoulders of giants, owing their success to over a decade of advancements in machine learning techniques, the widespread availability of massive amounts of image data, and improved hardware.
For some context, below is a brief outline of significant machine learning developments.
In 2009 at CVPR, the seminal Imagenet paper and dataset were released, which contained over 14 million hand-annotated images. This dataset was massive at the time, and it remains relevant to researchers and businesses building models today.
In 2014, GANs were introduced by Ian Goodfellow, establishing powerful generative capabilities for machine learning models.
In 2018 LLM’s hit the scene with the original GPT release, followed shortly by its successors GPT-2, and the current GPT-3, which have text generation capabilities.
In 2020, NeRFs allowed the world to produce 3D objects from a series of images, and known camera poses.
Over the past few years, Diffusion models have continued this evolution, giving us even more powerful generative capabilities.
What about diffusion models makes them so strikingly different from their predecessors? The most apparent answer is their ability to generate highly realistic imagery and match the distribution of real images better than GANs. Also, diffusion models are more stable than GANs, which are subject to mode collapse, where they only represent a few modes of the true distribution of data after training. This mode collapse means that in the extreme case, only a single image would be returned for any prompt, though the issue is not quite as extreme in practice. Diffusion models avoid the problem as the diffusion process smooths out the distribution, resulting in diffusion models having more diversity in imagery than GANs.
Diffusion models also can be conditioned on a wide variety of inputs, such as text for text-to-image generation, bounding boxes for layout-to-image generation, masked images for inpainting, and lower-resolution images for super-resolution.
The applications for diffusion models are vast, and the practical uses of these models are still evolving. These models will greatly impact Retail and eCommerce, Entertainment, Social Media, AR/VR, Marketing, and more.