Training and Deploying a Custom Stable Diffusion v2 Model
This tutorial walks through how to use the proxiML platform to personalize a stable diffusion version 2 model on a subject using DreamBooth and generate new images. It utilizes the Stable Diffusion Version 2 inference code from Stability-AI and the DreamBooth training code from Hugging Face's diffusers project. Running the entire example as described will consume approximately 1.5 credits ($1.50 USD).
Prerequisites
Before beginning this example, ensure that you have satisfied the following prerequisites.
- A valid proxiML account with a non-zero credit balance
- A python virtual environment with the proxiML CLI/SDK installed and configured.
There are additional methods to create datasets or receive job outputs that do not require these prerequisites, but the instructions would need to be modified to utilize them. Contact us if you need any help adapting them.
Prepare the Datasets
Effective DreamBooth training requires two sets of images. The first set is the target or instance images, which are the images of the object you want to be present in subsequently generated images. The second set is the regularization or class images, which are "generic" images that contain the same type of object as the target. For example, if you are training a model on a human, your class images should be of "male person", or "blonde female person". In the example case, the target is a specific dog, so the regularization images should be photos of dogs.
Download the target image dataset from diffusers example here and place them in a directory called "dog". If you have gdown installed in your virtual environment, you can use the following command to download all 5 images into the dog folder:
gdown --folder https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ
Next, create a dataset form this folder named instance-data
:
proximl dataset create instance-data ./dog
The regularization images can actually be generated by stable diffusion itself. To do so, launch an inference job to generate 200 images with the prompt "a photo of dog" and save the output to a new proxiML Dataset with the following command:
proximl job create inference \
--gpu-type rtx3090 \
--public-checkpoint stable-diffusion-v2-1-diffuser \
--git-uri https://github.com/proxiML/stable-diffusion-training-example.git \
--output-type proximl \
--output-uri dataset \
--no-archive \
"DreamBooth Regularization Image Generation" \
'./sd-2-prompt.sh --iters=40 --samples=5 "a photo of dog"'
To change the number of images generated, modify the --iters
parameter. If you increase the --samples
to higher than 6, you will run out of memory on an RTX3090. The total number of images generated will be iters
* samples
.
This job should consume 0.3-0.4 credits. Once the inference job is complete, it will automatically create a dataset named Job - <job name>
, which in this example's case will be Job - DreamBooth Regularization Image Generation
. Rename the dataset to something more succinct with the following command:
proximl dataset rename "Job - DreamBooth Regularization Image Generation" "regularization-data"
The diffusers dreambooth script can actually generate the regularization images as part of its training run. However, we demonstrate how to generate and attach these independently so that they can be reused across training runs. This way, if you want to train different models for different dogs, you can reuse the same "dog" regularization dataset to avoid having to regenerate it each time.
Train Custom Checkpoint
Once the datasets are ready, create a training job with the following command:
proximl job create training \
--gpu-type rtx3090 \
--public-checkpoint stable-diffusion-v2-1-diffuser \
--dataset instance-data \
--dataset regularization-data \
--git-uri https://github.com/proxiML/stable-diffusion-training-example.git \
--output-type proximl \
--output-uri checkpoint \
--no-archive \
--no-save-model \
"DreamBooth Training" \
'./dreambooth-train.sh --steps=800 --images=200 "a photo of dog" "a photo of sks dog"'
The training job will use the public stable-diffusion-v2-1-diffuser
checkpoint as a starting point to train a new custom checkpoint on the attached instance data. The --images
parameter must contain the same number of images in the regularization dataset or it will error due to a read-only file system. The first positional argument is the "class prompt" and the second positional argument is the "instance prompt". The "instance prompt" is the string you will later use on the trained model to ensure the target appears in the generated image. The --steps
parameter determines how long to train the model and should be adjusted as needed to avoid over/under-fitting.
The training run should take ~25 minutes and consume ~0.4 credits. Once it is complete, it will automatically create a checkpoint named Job - <job name>
, which in this example's case will be Job - DreamBooth Training
. Rename the dataset to something more succinct with the following command:
proximl checkpoint rename "Job - DreamBooth Training" "dog-checkpoint"
Wait for the checkpoint to finish saving before moving on with the following command:
proximl checkpoint attach "dog-checkpoint"
Generate Images with the Custom Checkpoint
Once the checkpoint is ready, you can use it to generate new images of the target object. The inference.py
file in this example repo generates a single image, but could be modified to generate as many images as needed. First create the output folder:
mkdir -p output
To generate a new image with the custom model and download it to your computer, use the following command:
proximl job create inference \
--gpu-type rtx3090 \
--checkpoint dog-checkpoint \
--git-uri https://github.com/proxiML/stable-diffusion-training-example.git \
--output-dir ./output \
--no-archive \
"DreamBooth Inference" \
'python inference.py --steps=50 --scale=7.5 --prompt "A photo of sks dog in jumping over the moon"'
The first generation may take additional time as the custom checkpoint is cached to the NVMe storage on the GPU server. Subsequent executions with the same checkpoint will be faster.
The inference job should consume ~0.05 credits for a single generation with the same settings as above.
Appendix
Using the Python SDK
The same workflow can be implemented directly in python using the asyncio library and the proxiML SDK.
Create the instance dataset:
from proximl import ProxiML
import asyncio
import os
proximl = ProxiML()
async def create_dataset():
dataset = await proximl.datasets.create(
name="instance-data",
source_type="local",
source_uri=os.path.abspath('./dog'),
)
assert dataset.id
await dataset.connect()
dataset = await dataset.wait_for("ready")
await dataset.disconnect()
asyncio.run(create_dataset())
Create the regularization dataset:
from proximl import ProxiML
import asyncio
import os
proximl = ProxiML()
async def create_regularization_data():
job = await proximl.jobs.create(
name="DreamBooth Regularization Image Generation",
type="inference",
gpu_type="rtx3090",
gpu_count=1,
disk_size=10,
workers=[
'./sd-2-prompt.sh --iters=40 --samples=5 "a photo of dog"',
],
data=dict(
output_type="proximl",
output_uri="dataset",
output_options=dict(archive=False, save_model=False),
),
model=dict(
source_type="git",
source_uri="https://github.com/proxiML/stable-diffusion-training-example.git",
checkpoints=[
dict(id="stable-diffusion-v2-1", public=True),
],
)
)
await job.attach()
dataset = await proximl.datasets.get(
job.workers[0].get("output_uuid")
)
await dataset.rename("regularization-data")
await job.remove()
asyncio.run(create_regularization_data())
Create the custom checkpoint:
from proximl import ProxiML
import asyncio
import os
proximl = ProxiML()
async def create_custom_checkpoint():
job = await proximl.jobs.create(
name="DreamBooth Training",
type="training",
gpu_type="rtx3090",
gpu_count=1,
disk_size=10,
workers=[
'./dreambooth-train.sh --steps=800 --images=200 "a photo of dog" "a photo of sks dog"',
],
data=dict(
datasets=["regularization-data","instance-data"],
output_type="proximl",
output_uri="checkpoint",
output_options=dict(archive=False, save_model=False),
),
model=dict(
source_type="git",
source_uri="https://github.com/proxiML/stable-diffusion-training-example.git",
checkpoints=[
dict(id="stable-diffusion-v2-1-diffuser", public=True),
],
)
)
await job.attach()
checkpoint = await proximl.checkpoints.get(
job.workers[0].get("output_uuid")
)
await checkpoint.wait_for("ready")
await checkpoint.rename("dog-checkpoint")
await job.remove()
asyncio.run(create_custom_checkpoint())
Generate a new image:
from proximl import ProxiML
import asyncio
import os
proximl = ProxiML()
async def generate_image():
job = await proximl.jobs.create(
name="DreamBooth Training",
type="inference",
gpu_type="rtx3090",
gpu_count=1,
disk_size=10,
workers=[
'python inference.py --steps=50 --scale=7.5 --prompt "A photo of sks dog in jumping over the moon"',
],
data=dict(
output_type="local",
output_uri=os.path.abspath('./output'),
output_options=dict(archive=False),
),
model=dict(
source_type="git",
source_uri="https://github.com/proxiML/stable-diffusion-training-example.git",
checkpoints=[
"dog-checkpoint",
],
)
)
await job.wait_for("waiting for data/model download")
attach_task = asyncio.create_task(job.attach())
connect_task = asyncio.create_task(job.connect())
await asyncio.gather(attach_task, connect_task)
print(os.listdir(os.path.abspath('./output')))
await job.remove()
asyncio.run(generate_image())