Author: Arun NandaLast Updated: Thu, Nov 10, 2022
The advent of machine learning, in particular, deep learning, has opened up new possibilities in computing. This technology has been making its way from researchers to practitioners in recent years. Deep learning models can summarize articles, answer open and closed-ended questions, generate text, and even generate images and videos.
In mid-late 2022, OpenAI released DALL-E 2 to the general public. DALL-E 2 is a deep learning model for generating images based on text prompts entered by the user. As of October 2022, this model is closed source. There were restrictions on the nature of images that went into the training set. There are also restrictions on the nature of prompts that a user can enter. Google's Imagen is another proprietary text-to-image model that is not available for the general public to use (as of October 2022).
Stable Diffusion is an open-source deep learning model that generates images from text prompts. Stability AI, the CompVis group at LMU Munich, and Runway are its main developers. Stable Diffusion models were trained using millions of publicly available (online) high-resolution images and their associated text descriptions (in the English language). The training process used a cluster of 4000 NVIDIA A100 GPUs.
The model weights, the model card, and the code are all open source. Anyone can train and customize the model or download and run a pre-trained model.
This guide shows how to implement (deploy) a pre-trained Stable Diffusion model using Keras and TensorFlow. The starting point is a vanilla Debian installation. The end goal is to use a pre-trained model to deploy your own system to generate images from text prompts. This guide does not cover how to train models from data.
In deep learning, input data are represented as tensors. The input data is transformed into the output by many sequential layers of tensor operations. TensorFlow and PyTorch are open-source frameworks of computational tools for building deep learning models. Keras is an easier-to-use Python frontend API at a higher level of abstraction. This makes it better suited to implement models. Keras uses TensorFlow as the backend for performing the actual computations. It can also use other backends like Theano. Keras is now integrated with TensorFlow. KerasCV is a "horizontal extension" of the core Keras API. It is a collection of building blocks specifically for computer vision problems.
Stable Diffusion was originally built on PyTorch. In September 2022, it became available directly on KerasCV.
As of October 2022, the Stable Diffusion model uses about 7 GB of Video-RAM. It is recommended to run the model on a system with at least 10 GB Video-RAM. Stability AI recommends using NVIDIA chips and plans to release optimized versions for other chipsets (AMD, Apple silicon, etc.). NVIDIA's A100 GPU is oriented toward applications like AI, analytics, and deep learning; the A40 GPU is better suited for visual computing, graphics, simulations, etc.
Vultr Cloud GPU VMs virtualize the GPU and allocate a fraction of the GPU chip's total capacity to each VM. In principle, it is possible to run the models on CPU-only machines. In practice, it can take orders of magnitude more time to generate images in the absence of a GPU.
This guide has been thoroughly tested on VMs featuring a 10 GB GPU RAM share of the NVIDIA A100 GPU. Trying to run the models on a machine with insufficient GPU memory leads to errors.
Note: The user does not need to do anything special to take advantage of the GPU. As long as the drivers and CUDA libraries are installed, it happens automatically.
Cloud GPU Virtual Machines are expensive. It can be cost-effective to set up the model on a Virtual Machine with a small amount of GPU RAM. This fetches the right drivers and dependencies. A snapshot of this VM can be restored onto a machine with more GPU RAM for running the model and generating images. Note that both source and target VMs must use the same GPU chipset.
Basic familiarity with Python 3 is necessary to follow the commands in the subsequent sections. This guide uses Conda to manage Python environments and PyPI
pip for package management.
The base Operating System used is the Vultr Debian 11 image. The commands should be 1-to-1 compatible with recent Ubuntu releases. To use another Operating System, adapt the package installation commands. Python commands are the same across Operating Systems.
It is also possible to use the Vultr Anaconda and Miniconda images. They come with Conda preinstalled. Skip the step (Section 3.2.1) about installing
conda if you use these images.
Deep Learning is a rapidly evolving field. The models and the software packages are frequently updated. This sometimes leads to compatibility issues. The current version of a package can depend on an older version of another package. The latest versions of the two packages might be incompatible with each other. Two required packages can each depend on different versions of a third package.
Wherever applicable, the installation commands mention the version numbers of the software used in the test setup. Using newer or older versions of the same software may lead to incompatibilities between the dependencies.
From the Vultr dashboard, deploy a new Cloud GPU instance. It is recommended to add an SSH key during deployment.
By default, you log in as the root user. It is strongly recommended to create and use a regular user account.
Log in to the new server as
root and create an unprivileged user:
# adduser username
Follow the on-screen instructions and complete the user creation process.
Give the new user
sudo rights so it can do privileged actions like installing new programs:
# usermod -aG sudo username
.ssh directory (containing the keys) of
root into the home directory of the new user. This allows you to SSH into the server as an unprivileged user.
# cp -R /root/.ssh /home/username/
Change the owner of the copied directory from
root to the new user:
# chown -R username /home/username/.ssh
The default Debian installation on Vultr uses UFW for managing the firewall. Confirm that this is the case by checking the UFW settings:
# ufw status verbose
The output should be equivalent to:
Status: active Logging: on (low) Default: deny (incoming), allow (outgoing), disabled (routed) New profiles: skip To Action From -- ------ ---- 22 (v6) ALLOW IN Anywhere 22 (v6) ALLOW IN Anywhere (v6)
If the firewall is not configured, set it up before continuing further.
Open a new terminal window and log in as the unprivileged user:
After confirming that the new user has SSH and
sudo access, go back to the old terminal session where you are logged in as
root. Exit that session and continue further as the regular user.
This guide uses Conda for managing Python environments. Conda also makes it easier to install the required CUDA and cuDNN software. CUDA is a parallel computation platform for NVIDIA GPUs. cuDNN is a library for Deep Neural Networks optimized for CUDA.
Miniconda provides a minimal installation tool for Conda. Copy the download link for the latest Miniconda release from the Conda website. Download the file:
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Execute the downloaded script to install Miniconda:
$ bash Miniconda3-latest-Linux-x86_64.sh
Read and follow the on-screen instructions to complete the installation process and to initialize Conda.
Check if Conda is installed:
$ conda --version
Note: Skip this subsection if you use the Vultr Anaconda or Miniconda installation images.
conda is upgraded to the latest version:
$ conda upgrade conda
Create a new Conda environment:
$ conda create --name env1 python=3.9
This creates a new environment env1 and pre-installs Python 3.9 into it. Read and follow the on-screen instructions during the installation process.
View the list of Conda environments with
conda env list. Activate the newly created Conda environment:
$ conda activate env1
Enter the command
conda deactivate at the terminal to deactivate the currently active environment.
The rest of this guide assumes that the Conda environment
env1 is active.
pip to the latest version:
$ pip install --upgrade pip
Install the CUDA and cuDNN packages (into the current Conda environment):
$ conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
Update the system path to reflect the Conda installation paths:
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
Create the Conda activation directory and add the system paths to it.
$ mkdir -p $CONDA_PREFIX/etc/conda/activate.d $ echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
This automatically loads the paths at the start of every Conda session.
The version of KerasCV on the Conda repository is outdated (as of October 2022) and doesn't include Stable Diffusion. Therefore, PyPI
pip is used to install KerasCV and its dependencies (into the current Conda environment).
Install TensorFlow with
$ pip install tensorflow==2.10.0
Note: The PyPI repositories contain the packages
tensorflow-cpu. It is sufficient to install just
tensorflow - since TensorFlow v2.0, this package supports both CPU and GPU computations.
Install TensorFlow Datasets:
$ pip install tensorflow-datasets==4.7.0
These datasets are used by the Deep Learning libraries.
$ pip install keras-cv==0.3.4
Install Matplotlib to process images:
$ pip install matplotlib==3.6.1
Verify that the software has been installed correctly and is working as expected. Open a Python terminal:
>>> import tensorflow as tf
Run a CPU based TensorFlow operation:
>>> print(tf.reduce_sum(tf.random.normal([1000, 1000])))
If the above command ran properly, it should return a tensor. The output should look like:
tf.Tensor(-1798.1721, shape=(), dtype=float32)
Run a GPU based command:
The above command lists the GPU devices available. The output should look like:
Exit the Python command line.
Importing TensorFlow in Python may have shown a message like this:
>>> import tensorflow as tf ... 2022-10-23 05:01:28.981482: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2022-10-23 05:01:28.981561: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2022-10-23 05:01:28.981576: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. ...
This message is a warning that the installation is missing TensorRT, which is an SDK specifically for deep learning inference. Deep learning models are of two broad classes - inference and prediction. Stable Diffusion is a prediction model; it does not need TensorRT to run.
Viewing images on a remote server over SSH needs a bit of effort. As the last step before generating images, create the setup to view the generated images. Use
sshfs to mount and browse a remote directory on the local filesystem.
sshfs is installed on the local machine:
$ sshfs --version
Install it, if it already isn't:
# apt install sshfs $ brew install sshfs
Create an empty directory on the local machine:
$ mkdir ~/sshfs_mountpoint
In the next section, you will save the generated images on the server's home directory. Connect
sshfs to the server and mount the home directory:
$ sshfs -o follow_symlinks firstname.lastname@example.org:/home/username ~/sshfs_mountpoint
This allows you to browse (either via command-line or the local GUI) the remote directory as if it were a local directory. View the images using a viewing program on the local machine, or copy the images to a local directory.
Note: Depending on the server's
sshd_config settings, the connection may close if left unused for a period of time. When this happens, reconnect (using the above command) and refresh the GUI window.
Start a new Python command line and import the necessary modules.
Matplotlib is used to save the generated images.
>>> import matplotlib.pyplot as plt
>>> from tensorflow import keras
Set Keras to use half-precision floating point numbers (16 bits instead of the usual 32 bits):
This helps to speed up GPU computations.
Import the KerasCV module:
>>> import keras_cv
Stable Diffusion is packaged as a part of KerasCV.
Construct a new Stable Diffusion model with the default settings:
>>> model = keras_cv.models.StableDiffusion()
This downloads the current version of the text encoder, the diffusion model, the decoder, and also constructs the model.
Call the model with a text prompt to generate an image:
>>> image = model.text_to_image(prompt='oil painting of a vulture wearing a christmas hat')
This generates an array consisting of a single image. Save this image to the current working directory:
>>> plt.imsave('/home/username/vulture_pic.png', image)
sshfs (as discussed in Section 3.4) to view the saved image.
Restart the Python command line to free up memory for a new model. Import
keras_cv as before.
keras_cv.models.StableDiffusion() constructs a new Stable Diffusion Model that generates
512x512 images. It is possible to change this with the parameters
>>> model = keras_cv.models.StableDiffusion(img_height = 128, img_width = 128)
Note that the resolution directly affects the GPU memory required by the model to function.
model.text_to_image accepts the following parameters:
prompt: the text prompt based on which images are generated
batch_size: the number of images to generate at a time. The default value is 1.
num_steps: the number of iterations - the higher the number of iterations, the better the image quality and detail. The default value is 25. The higher this value, the longer it takes to generate the image(s). This parameter mainly affects the CPU usage time.
Generate a single image of low quality:
images = model.text_to_image( prompt = 'oil painting of a vulture wearing a christmas hat', batch_size = 1, num_steps = 5, )
Generate a set of 3 images of high quality:
images = model.text_to_image( prompt = 'oil painting of a vulture wearing a christmas hat', batch_size = 3, num_steps = 50, )
The output in the above examples is an array of one or more images.
Write a simple function to save multiple generated images in an array:
def save_images(images, file_name_with_path): for i in range(len(images)): plt.imsave(file_name_with_path + str(i+1) + '.png', images[i])
Call this function as:
This saves the generated images with the filenames
christmas_vultr_2.png, and so on, in the home directory,
sshfs to view the saved images.
Since Stable Diffusion relies on the GPU, heavy models cause the system to run out of memory.
An out-of-memory error message looks like this:
2022-10-28 04:33:01.109327: I tensorflow/core/common_runtime/bfc_allocator.cc:1097] 1 Chunks of size 122713088 totalling 117.03MiB 2022-10-28 04:33:01.109335: I tensorflow/core/common_runtime/bfc_allocator.cc:1097] 2 Chunks of size 151781376 totalling 289.50MiB 2022-10-28 04:33:01.109344: I tensorflow/core/common_runtime/bfc_allocator.cc:1097] 2 Chunks of size 176947200 totalling 337.50MiB ... Memory allocation details ... 2022-10-28 04:33:01.109362: I tensorflow/core/common_runtime/bfc_allocator.cc:1101] Sum Total of in-use chunks: 6.66GiB 2022-10-28 04:33:01.109370: I tensorflow/core/common_runtime/bfc_allocator.cc:1103] total_region_allocated_bytes_: 7365066752 memory_limit_: 7365066752 available bytes: 0 curr_region_allocation_bytes_: 14730133504 2022-10-28 04:33:01.109384: I tensorflow/core/common_runtime/bfc_allocator.cc:1109] Stats: Limit: 7365066752 InUse: 7151588096 MaxInUse: 7151588352 NumAllocs: 5650 MaxAllocSize: 176947200 Reserved: 0 PeakReserved: 0 LargestFreeBlock: 0 2022-10-28 04:33:01.109467: W tensorflow/core/common_runtime/bfc_allocator.cc:491] **************************************************************************************************** 2022-10-28 04:33:01.113312: W tensorflow/core/framework/op_kernel.cc:1768] RESOURCE_EXHAUSTED: failed to allocate memory Traceback (most recent call last): ... Detailed stack trace ...
It is difficult to accurately predict all the parameters that cause the system to run out of memory. These are some example scenarios where out-of-memory errors were observed as of October 2022 on a VM with 10 GB GPU memory share of an NVIDIA A100 GPU:
Constructing two models (by calling the
keras_cv.StableDiffusion function twice) in the same Python session
Attempting to generate an image with a high-resolution (1024x1024) model; constructing the model itself was not a problem
Attempting to generate a large number of images in a single pass (using a large batch size)
If you run into out-of-memory errors, either reduce the memory requirement of the model (for instance, by constructing the model at a lower resolution), or use a VM with more GPU RAM.
Stability AI has released the model to the general public under a Creative ML OpenRAIL-M license. It permits commercial and non-commercial usage. The license is focused on ensuring ethical and legal usage of the model. It also mandates that the license itself be included with any distribution of the model, including to end users.