An overview and list of free resources to help you get started.

Easiest way to play with these models is to sign up for a Google Colab account.

Couple ways to generate an image from text

  1. Inference from a model trained for this task
  2. Guiding an image representation with CLIP

All the generated images below are from prompt Winds of Winter

Model Inference

Very quick and easy to generate lots of images. It's recommended to generate 100 or so images using model as generation is quite cheap and later rank them using CLIP to get a good dozen generations.

minDALL-E

One of the best text to image model available right now.

Code: https://github.com/kakaobrain/minDALL-E

Cogview

Pretty good text to image model. Only Older cogview model has been released publicly. You can try newer model which is faster and better only at online demo.

Code: https://github.com/THUDM/CogView

Online Demo: https://agc.platform.baai.ac.cn/CogView/index.html

Ru-dalle

They have trained 2 models, XL and XXL. Only XL has been released so far which has 1.3 billion params. Pretty amazing results, similar to cogview 2.

They also released Ru-CLIP.

Code: https://github.com/sberbank-ai/ru-dalle

Colab: https://colab.research.google.com/drive/1wGE-046et27oHvNlBNPH07qrEQNE04PQ?usp=sharing

Dall.E Mini

Code: https://github.com/borisdayma/dalle-mini

Colab: https://colab.research.google.com/github/borisdayma/dalle-mini/blob/main/dev/inference/inference_pipeline.ipynb

Clip Conditioned Decision Transformer

Colab by Rivers Have Wings: https://colab.research.google.com/drive/1dFV3GCR5kasYiAl8Bl4fBlLOCdCfjufI

Guiding Image Representation with CLIP

It's more like model training rather than model inference. As expected, it takes a while to generate an image.

Image representation can be as simple as RGB array, set of bezier curves to latent representations of various AEs/GANs like VQGAN, BigGAN, StyleGAN, OpenAI dVAE.

RGB Optimization

This doesn't need much GPU VRAM, works really well on just 4GB of it. Good for getting high resolution outputs.

PyramidVisions Colab

Vector Strokes Optimization

CLIPDraw by Kevin Frans

Code: https://github.com/kvfrans/clipdraw

Colab: https://colab.research.google.com/github/kvfrans/clipdraw/blob/main/clipdraw.ipynb

OpenAI dVAE Optimization

Colab of Implementation by Rivers Have Wings: https://colab.research.google.com/drive/10DzGECHlEnL4oeqsN-FWCkIe_sq3wVqt

VQGAN CLIP

Rivers Have Wings coded up an implementation which connected VQGAN with CLIP. Almost every other VQGAN CLIP notebook is derived from it. Most derivatives have different learning rate strategies, image augmentations, optimizers and generally results are much better.

Colabs

Implementation by Rivers Have Wings: https://colab.research.google.com/drive/15UwYDsnNeldJFHJ9NdgYBYeo6xPmSelP

Implementation by crimeacs#8222: https://colab.research.google.com/drive/1ZAus_gn2RhTZWzOWUpPERNC0Q8OhZRTZ

Diffusion Models

Much better global coherency compared to CLIP guided GANs. Generation may not follow prompt as well as CLIP guided GAN's do.

Colab Implementation by nshepperd: https://colab.research.google.com/drive/10dvDxcS4e4anlwpJE2yLBjq0O1vKqxdn

We have an app which provides an easy way to play with text to image algorithms. If you would like to give it a go, sign up at https://snowpixel.app/ and send us your sign up email at hello@snowpixel.app. We'll add couple free credits to your account to try it out.