Skip to main content

Stable Diffusion v2 Text-to-Image and Image-To-Image

This tutorial walks through how to prepare and utilize the Stable Diffusion 2 text-to-image and image-to-image functionality on the proxiML platform.

Prerequisites

Before beginning this example, ensure that you have satisfied the following prerequisites.

tip

There are additional methods to create models or receive job outputs that do not require these prerequisites, but the instructions would need to be modified to utilize them. Contact us if you need any help adapting them.

Prepare the Model

The first step is to create a proxiML Model with all the code, model weights, and checkpoints pre-downloaded. This way, the model only has to be uploaded a single time and can be reused for all subsequent jobs. Not only will this make jobs start much faster, but there are no compute charges for creating a model.

Start by cloning the repository:

git clone https://github.com/proxiML/stable-diffusion-2.git
cd stable-diffusion-2

The proxiML version of the repository has some additional packages in the requirements file and some helper scripts to make launching jobs easier.

Inside the project directory, create and activate a new python virtual environment inside the project, and install the required packages:

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

Next, download the CLIP model that is a dependency of the stable diffusion model. Use the HUGGINGFACE_HUB_CACHE environment variable to place the model file inside the project directory (instead of the default cache folder). Paste the following into the terminal in one shot:

HUGGINGFACE_HUB_CACHE=$(pwd) python - << EOF
import open_clip
import torch
open_clip.create_model_and_transforms("ViT-H-14", device=torch.device('cpu'), pretrained="laion2b_s32b_b79k")
EOF

This should create a folder called models--laion--CLIP-ViT-H-14-laion2B-s32B-b79K in the root of the project directory.

Next, download the specific checkpoints you need for the application you plan to use in later steps. Create the folders to store the checkpoints:

mkdir -p checkpoints midas_models
cd checkpoints

The following checkpoints are available from Stability AI's on Hugging Face. Although you can create a model with all available checkpoints, we recommend to use as few as necessary for any given model to save on storage costs, upload time, and job start time.

  • v2.1 - wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/v2-1_768-ema-pruned.ckpt
  • v2.1 Base - wget https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/v2-1_512-ema-pruned.ckpt
  • v2.0 - wget https://huggingface.co/stabilityai/stable-diffusion-2/resolve/main/768-v-ema.ckpt
  • v2.0 Base - wget https://huggingface.co/stabilityai/stable-diffusion-2-base/resolve/main/512-base-ema.ckpt
  • Depth - wget https://huggingface.co/stabilityai/stable-diffusion-2-depth/resolve/main/512-depth-ema.ckpt
  • Upscaling - wget https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/resolve/main/x4-upscaler-ema.ckpt
  • Inpainting - wget https://huggingface.co/stabilityai/stable-diffusion-2-inpainting/resolve/main/512-inpainting-ema.ckpt

If using the Depth model, pre-download the MiDaS model as well:

cd ../midas_models
wget https://github.com/intel-isl/DPT/releases/download/1_0/dpt_hybrid-midas-501f0c75.pt
info

All the code examples assume you are using the v2-1_768-ema-pruned checkpoint. This is the only checkpoint you need to complete the Notebook and Inference Job sections. If you want to deploy the image modification endpoints, you will need their association checkpoints as well (512-depth-ema.ckpt,x4-upscaler-ema.ckpt, and/or 512-inpainting-ema.ckpt)

Once the weights are downloaded, create a proxiML model from the root directory of the project:

cd ..
proximl model create "stable-diffusion-2" $(pwd)
caution

You can change the name of the model, but if you do, you will need to update the job creation commands with the new model name.

You will see console output indicating the progress of the model upload. The total duration will depend on the upload speed of your internet connection. Depending on how many checkpoints you downloaded, the model will be anywhere from 9 - 28 GB. Once the model upload is complete, the console will return. You can also view the model on the proxiML Models Dashboard.

Interactive Prompts with Notebooks

Once the model is ready, you can create a proxiML Notebook from the Notebook Dashboard or using the proxiML CLI. The key settings to select are the following:

  • GPU: RTX 3090
  • Disk Size: 30GB or greater
  • Model Source Type: proxiML, Model: "stable-diffusion-2"
  • Add the environment variable HUGGINGFACE_HUB_CACHE with the value /opt/ml/models

To create the notebook with these settings from the CLI with a single command, enter the following:

proximl job create notebook "Stable Diffusion 2" \
--model-id $(proximl model list | grep stable-diffusion-2 | awk '{print $1}') \
--disk-size 30 \
--gpu-type rtx3090 \
--env HUGGINGFACE_HUB_CACHE=/opt/ml/models
tip

The proximl model list | grep stable-diffusion-2 | awk '{print $1}' part of the command simply returns the model ID of the model named stable-diffusion-2. If you already know the model ID of your model, you can substitute that ID directly in to the command.

After a few minutes, the notebook window should open in your browser. If you created the notebook from the Notebook Dashboard, click the Open button once the notebook changes status to running.

Once the JupyterLab environment opens, you can inspect the models folder in the file browser to verify that the model copied successfully or open a terminal window to begin running scripts.

caution

The following sections assume that your model was created with the v2.1 checkpoint. Ensure that the file named v2-1_768-ema-pruned.ckpt is inside the checkpoints directory before proceeding. If you use a different checkpoint file, you will need to manually update the checkpoint and config paths in the examples.

Text-to-Image Prompts

To generate a new image, enter the following command into the terminal window of the notebook.

python scripts/txt2img.py \
--ckpt $ML_CHECKPOINT_PATH/v2-1_768-ema-pruned.ckpt \
--config $ML_MODEL_PATH/configs/stable-diffusion/v2-inference-v.yaml \
--outdir $ML_OUTPUT_PATH \
--H 768 --W 768 \
--n_iter 5 --n_samples 5 \
--prompt "a professional photograph of an astronaut riding a horse"

Modify the prompt to achieve the desired image results. Each new prompt will create a new grid file in the output folder and new images in the samples subfolder and will not overwrite previous files. The total number of images generated per command is n_iter multiplied by n_samples. n_iter determines how many times sampling runs for each prompt and n_samples is how many images to pull from each iteration. In the resulting grid file, each iteration is a row, and each sample from each iteration is a column.

Warning

n_samples is also the batch size, increasing this number above 6 can cause CUDA out of memory errors on an RTX 3090.

For additional options when running the command type python scripts/txt2img.py --help.

Image-to-Image Upscaling

To modify an image, first upload the desired source image to the Notebook by using the upload button. Then run the following command into the terminal window, replacing the prompt and init-img arguments with your desired inputs.

python $ML_MODEL_PATH/scripts/img2img.py \
--config $ML_MODEL_PATH/configs/stable-diffusion/v2-inference-v.yaml \
--ckpt $ML_MODEL_PATH/checkpoints/v2-1_768-ema-pruned.ckpt \
--outdir $ML_OUTPUT_PATH \
--prompt "description of image you want" \
--init-img /path/to/image.jpg \
--n_iter 5 --n_samples 5

To use the example image provided with the repository, run:

python $ML_MODEL_PATH/scripts/img2img.py \
--config $ML_MODEL_PATH/configs/stable-diffusion/v2-inference-v.yaml \
--ckpt $ML_MODEL_PATH/checkpoints/v2-1_768-ema-pruned.ckpt \
--outdir $ML_OUTPUT_PATH \
--prompt "A photorealistic fantasy landscape" \
--init-img $ML_MODEL_PATH/assets/stable-samples/img2img/sketch-mountains-input.jpg \
--n_iter 5 --n_samples 5

Go to the /opt/ml/output folder in the notebook to see the results.

Serverless Prompts with Inference Jobs

Notebook Jobs are the easiest way to get started or do minor experimentation. However, they are inadequate for large scale generation or automation. Inference Jobs enable you to run multiple generations or transformations in parallel without having to keep track of starting or stopping instances. Inference jobs can be launched from the Inference Job Dashboard, using the Python SDK, or from the proxiML CLI. They automatically download any specified input data, load the model, install requirements, execute the inference code, upload the results, and stop when finished.

Text-To-Image

To run the same text-to-image prompt as in the notebook example as an inference job, use the following command:

proximl job create inference "Stable Diffusion 2 - Inference" \
--model-id $(proximl model list | grep stable-diffusion-2 | awk '{print $1}') \
--disk-size 30 \
--gpu-type rtx3090 \
--env HUGGINGFACE_HUB_CACHE=/opt/ml/models \
--output-dir $(pwd)/outputs \
'python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt $ML_MODEL_PATH/checkpoints/v2-1_768-ema-pruned.ckpt --config $ML_MODEL_PATH/configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768 --outdir $ML_OUTPUT_PATH --n_iter 5 --n_samples 2'

By default, the cli automatically attaches to the job and streams the job's log output. After a few minutes, the job will return. The generated images will be in the outputs folder of the current directory in a zip file named Stable_Diffusion_2_-_Inference.zip. Unzip the file to see the results. Depending on the settings used for the prompt, each job should cost around $0.05-0.06.

Although the CLI or web interface are an easy way to get started with inference jobs, they are most useful when launched programmatically. To launch this same inference job from a Python script, use the following code inside of an async function, replacing the values for model["source_uri"] and data["output_uri"] with the correct values for your application.

job = await proximl.jobs.create(
"Stable Diffusion 2 - Inference",
type="inference",
gpu_types=["rtx3090"],
gpu_count=1,
disk_size=30,
model=dict(
source_type="proximl",
source_uri="my-stable-diffusion-model-id",
),
environment=dict(
type="DEEPLEARNING_PY39",
env=[
dict(key="HUGGINGFACE_HUB_CACHE", value="/opt/ml/models"),
],
),
data=dict(
output_type="local",
output_uri="/path/to/upload/outputs/to",
),
workers=[
'python scripts/txt2img.py \
--ckpt $ML_MODEL_PATH/checkpoints/v2-1_768-ema-pruned.ckpt \
--config $ML_MODEL_PATH/configs/stable-diffusion/v2-inference-v.yaml \
--outdir $ML_OUTPUT_PATH \
--H 768 --W 768 \
--n_iter 5 --n_samples 5 \
--prompt "a professional photograph of an astronaut riding a horse"'
],
)
await job.wait_for("waiting for data/model download")
await asyncio.gather(
job.connect(), job.attach()
) ## this returns when the job is finished
await job.disconnect()
await job.remove()

You can obtain the proxiML model ID for the model you uploaded by running the command:

proximl model list

and copying the first column value for the model you want.

Image-to-Image

To run the same image-to-image prompt as in the notebook example as an inference job, use the following command:

proximl job create inference "Stable Diffusion 2 - Upscale Inference" \
--model-id $(proximl model list | grep stable-diffusion-2 | awk '{print $1}') \
--disk-size 30 \
--gpu-type rtx3090 \
--env HUGGINGFACE_HUB_CACHE=/opt/ml/models \
--input-dir $(pwd)/assets/stable-samples/img2img \
--output-dir $(pwd)/outputs \
'python $ML_MODEL_PATH/scripts/img2img.py --config $ML_MODEL_PATH/configs/stable-diffusion/v2-inference-v.yaml --ckpt $ML_MODEL_PATH/checkpoints/v2-1_768-ema-pruned.ckpt --outdir $ML_OUTPUT_PATH --prompt "A photorealistic fantasy landscape" --init-img $ML_INPUT_PATH/sketch-mountains-input.jpg --n_iter 5 --n_samples 5'

By default, the cli automatically attaches to the job and streams the job's log output. After a few minutes, the job will return. The generated images will be in the outputs folder of the current directory in a zip file named Stable_Diffusion_2_-_Upscale_Inference.zip. Unzip the file to see the results. Depending on the settings used for the prompt, each job should cost around $0.07-0.08.

Similar to the text-to-image example, this can also be launched from a Python script:

job = await proximl.jobs.create(
"Stable Diffusion 2 - Inference",
type="inference",
gpu_types=["rtx3090"],
gpu_count=1,
disk_size=30,
model=dict(
source_type="proximl",
source_uri=<b>"my-stable-diffusion-model-id"</b>,
),
environment=dict(
type="DEEPLEARNING_PY39",
env=[
dict(key="HUGGINGFACE_HUB_CACHE", value="/opt/ml/models"),
],
),
data=dict(
input_type="local",
input_uri="/path/to/input/image/directory",
output_type="local",
output_uri="/path/to/upload/outputs/to",
),
workers=[
'python $ML_MODEL_PATH/scripts/img2img.py \
--config $ML_MODEL_PATH/configs/stable-diffusion/v2-inference-v.yaml \
--ckpt $ML_MODEL_PATH/checkpoints/v2-1_768-ema-pruned.ckpt \
--outdir $ML_OUTPUT_PATH \
--prompt "A photorealistic fantasy landscape" \
--init-img $ML_INPUT_PATH/<b>name-of-input-image.jpg</b> \
--n_iter 5 --n_samples 5'
],
)
await job.wait_for("waiting for data/model download")
await asyncio.gather(
job.connect(), job.attach()
) ## this returns when the job is finished
await job.disconnect()
await job.remove()

This case, replacing the values for model["source_uri"], data["output_uri"], data["input_uri"], --init-img in the worker command with the appropriate values.

Warning

Ensure that input_uri is a directory, not a file. Although other supported storage sources support loading individual files, the local storage driver only works on directories.

Image Modification Endpoints

The Stable Diffusion 2 repository also provides 3 specialized image-to-image checkpoints with associated web demos. To launch the web demos on the proxiML platform, use the create_endpoint.py python script added to the proxiML version of the repository.

caution

The script assumes that the specialized weights were downloaded into the proxiML model before creation. For example, to launch the upscaling endpoint, the Upscaling checkpoint (x4-upscaler-ema.ckpt) must be loaded into the checkpoints folder of the model during model preparation

The syntax for the command is the following:

python create_endpoint.py MODEL_ID \
--server-type [streamlit, gradio] \
--model-type [depth, upscaling, inpainting]
  • MODEL_ID is the ID of the proxiML model with the code repository and pre-downloaded checkpoints. You can find this by using the proximl model list command.
  • server-type is the web server implementation to use. The Stable Diffusion 2 repository implemented all the servers in gradio and streamlit
  • model-type is the type of image modification demo to launch

For example, to launch the streamlit version of the image upscaler on the model created in the original step (assuming the x4-upscaler-ema.ckpt checkpoint was downloaded), run the following:

python create_endpoint.py $(proximl model list | grep stable-diffusion-2 | awk '{print $1}') \
--server-type streamlit \
--model-type upscaling

The script will automatically create the endpoint and open a web browser to the endpoint when it is ready. You can also find the link to the endpoint from the Endpoint Dashboard by clicking the Connect button. The URL is only available when the endpoint is running.

Warning

Endpoints will run until you stop them. You are being charge computing time as long as they are running. Be sure to stop your endpoint once you are done to avoid incurring unexpected charged. Closing the browser window does not stop the endpoint