Skip to Main Content

Tutorial on Creating Multimedia with AI Tools: How does the technology work?

How does the technology work?


How does the technology work?

You might wonder, how can text be turned into an image?

This works using something called a diffusion model.

diffusion model

The forward diffusion phase.


How does a diffusion model work?

Imagine you have a clear, detailed image, like a photograph of a dog. In the first phase, called "forward diffusion," the model gradually adds static to this image. It's similar to a video display malfunctioning, causing the picture to become increasingly covered in static. Step by step, the original image becomes less recognizable until it's completely obscured by static.

During its training, the model learns how this process of adding static transforms a clear image into one that's completely obscured. It understands the patterns and stages of this transformation. This learning is critical, because later the model will reverse the process in order to generate an image.

diffusion model 2

The reverse diffusion phase -- notice that it's a different dog than in the first image above.


Now, starting with a canvas of random static, the model uses its knowledge from the forward diffusion training. It begins to remove the static, but in a very targeted way, informed by your text input. Now instead of returning to the original image, it creates a new one based on your description.

It can do this because it has been trained on a vast array of images with text descriptions, so it knows what dogs look like and how they are typically shown in images.

Remember: models don't contain copies of images

These models don't contain copies of images. Instead they are trained to add pixels that look like static and then remove the static, so that they can use that knowledge to generate new images.

Where do models get the images they are trained on?

There are many different sets of images with text descriptions that can be used to train image generation models.

Examples: LAIONVisual Genome dataset, Flickr30k dataset, Microsoft COCO

Learn more about diffusion models

If you're interested in learning more about how this works, read How A.I. Creates Art - A Gentle Introduction to Diffusion Models by Zain Hasan.

This tutorial is licensed under a Creative Commons Attribution 4.0 International License.

Vincennes University

812-888-VUVU | 800-742-9198

1002 North First Street; Vincennes, Indiana 47591

www.vinu.edu/

Shake Library

812-888-4165 | libref@vinu.edu

1002 North First Street; Vincennes, Indiana 47591

vinu.libguides.com/shakelibrary

Jasper Academic Center for Excellence

812-481-5923 | ace@vinu.edu

850 College Ave; Jasper, IN 47546

vinu.libguides.com/jasper