Training and Deploying a Custom Stable Diffusion v2 Model

This tutorial walks through how to use the trainML platform to personalize a stable diffusion version 2 model on a subject using DreamBooth and generate new images. It utilizes the Stable Diffusion Version 2 inference code from Stability-AI and the DreamBooth training code from Hugging Face's diffusers project. Running the entire example as described will consume approximately 1.5 credits ($1.50 USD).

Prerequisites

Before beginning this example, ensure that you have satisfied the following prerequisites.

A valid trainML account with a non-zero credit balance
A python virtual environment with the trainML CLI/SDK installed and configured.

tip

There are additional methods to create datasets or receive job outputs that do not require these prerequisites, but the instructions would need to be modified to utilize them. Contact us if you need any help adapting them.

Prepare the Datasets

Effective DreamBooth training requires two sets of images. The first set is the target or instance images, which are the images of the object you want to be present in subsequently generated images. The second set is the regularization or class images, which are "generic" images that contain the same type of object as the target. For example, if you are training a model on a human, your class images should be of "male person", or "blonde female person". In the example case, the target is a specific dog, so the regularization images should be photos of dogs.

Download the target image dataset from diffusers example here and place them in a directory called "dog". If you have gdown installed in your virtual environment, you can use the following command to download all 5 images into the dog folder:

gdown --folder https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ

Next, create a dataset form this folder named instance-data:

trainml dataset create instance-data ./dog

The regularization images can actually be generated by stable diffusion itself. To do so, launch an inference job to generate 200 images with the prompt "a photo of dog" and save the output to a new trainML Dataset with the following command:

trainml job create inference \
--gpu-type rtx3090 \
--public-checkpoint stable-diffusion-v2-1-diffuser \
--git-uri https://github.com/trainML/stable-diffusion-training-example.git \
--output-type trainml \
--output-uri dataset \
--no-archive \
"DreamBooth Regularization Image Generation" \
'./sd-2-prompt.sh --iters=40 --samples=5 "a photo of dog"'

To change the number of images generated, modify the --iters parameter. If you increase the --samples to higher than 6, you will run out of memory on an RTX3090. The total number of images generated will be iters * samples.

This job should consume 0.3-0.4 credits. Once the inference job is complete, it will automatically create a dataset named Job - <job name>, which in this example's case will be Job - DreamBooth Regularization Image Generation. Rename the dataset to something more succinct with the following command:

trainml dataset rename "Job - DreamBooth Regularization Image Generation" "regularization-data"

tip

The diffusers dreambooth script can actually generate the regularization images as part of its training run. However, we demonstrate how to generate and attach these independently so that they can be reused across training runs. This way, if you want to train different models for different dogs, you can reuse the same "dog" regularization dataset to avoid having to regenerate it each time.

Train Custom Checkpoint

Once the datasets are ready, create a training job with the following command:

trainml job create training \
--gpu-type rtx3090 \
--public-checkpoint stable-diffusion-v2-1-diffuser \
--dataset instance-data \
--dataset regularization-data \
--git-uri https://github.com/trainML/stable-diffusion-training-example.git \
--output-type trainml \
--output-uri checkpoint \
--no-archive \
--no-save-model \
"DreamBooth Training" \
'./dreambooth-train.sh --steps=800 --images=200 "a photo of dog" "a photo of sks dog"'

The training job will use the public stable-diffusion-v2-1-diffuser checkpoint as a starting point to train a new custom checkpoint on the attached instance data. The --images parameter must contain the same number of images in the regularization dataset or it will error due to a read-only file system. The first positional argument is the "class prompt" and the second positional argument is the "instance prompt". The "instance prompt" is the string you will later use on the trained model to ensure the target appears in the generated image. The --steps parameter determines how long to train the model and should be adjusted as needed to avoid over/under-fitting.

The training run should take ~25 minutes and consume ~0.4 credits. Once it is complete, it will automatically create a checkpoint named Job - <job name>, which in this example's case will be Job - DreamBooth Training. Rename the dataset to something more succinct with the following command:

trainml checkpoint rename "Job - DreamBooth Training" "dog-checkpoint"

Wait for the checkpoint to finish saving before moving on with the following command:

trainml checkpoint attach "dog-checkpoint"

Generate Images with the Custom Checkpoint

Once the checkpoint is ready, you can use it to generate new images of the target object. The inference.py file in this example repo generates a single image, but could be modified to generate as many images as needed. First create the output folder:

mkdir -p output

To generate a new image with the custom model and download it to your computer, use the following command:

trainml job create inference \
--gpu-type rtx3090 \
--checkpoint dog-checkpoint \
--git-uri https://github.com/trainML/stable-diffusion-training-example.git \
--output-dir ./output \
--no-archive \
"DreamBooth Inference" \
'python inference.py --steps=50 --scale=7.5 --prompt "A photo of sks dog in jumping over the moon"'

info

The first generation may take additional time as the custom checkpoint is cached to the NVMe storage on the GPU server. Subsequent executions with the same checkpoint will be faster.

The inference job should consume ~0.05 credits for a single generation with the same settings as above.

Appendix

Using the Python SDK

The same workflow can be implemented directly in python using the asyncio library and the trainML SDK.

Create the instance dataset:

from trainml import TrainML
import asyncio
import os

trainml = TrainML()

async def create_dataset():
    dataset = await trainml.datasets.create(
        name="instance-data",
        source_type="local",
        source_uri=os.path.abspath('./dog'),
    )
    assert dataset.id
    await dataset.connect()
    dataset = await dataset.wait_for("ready")
    await dataset.disconnect()

asyncio.run(create_dataset())

Create the regularization dataset:

from trainml import TrainML
import asyncio
import os

trainml = TrainML()

async def create_regularization_data():
    job = await trainml.jobs.create(
        name="DreamBooth Regularization Image Generation",
        type="inference",
        gpu_type="rtx3090",
        gpu_count=1,
        disk_size=10,
        workers=[
            './sd-2-prompt.sh --iters=40 --samples=5 "a photo of dog"',
        ],
        data=dict(
            output_type="trainml",
            output_uri="dataset",
            output_options=dict(archive=False, save_model=False),
        ),
        model=dict(
            source_type="git",
            source_uri="https://github.com/trainML/stable-diffusion-training-example.git",
            checkpoints=[
                dict(id="stable-diffusion-v2-1", public=True),
            ],
        )
    )
    await job.attach()

    dataset = await trainml.datasets.get(
        job.workers[0].get("output_uuid")
    )
    await dataset.rename("regularization-data")
    await job.remove()

asyncio.run(create_regularization_data())

Create the custom checkpoint:

from trainml import TrainML
import asyncio
import os

trainml = TrainML()

async def create_custom_checkpoint():
    job = await trainml.jobs.create(
      name="DreamBooth Training",
      type="training",
      gpu_type="rtx3090",
      gpu_count=1,
      disk_size=10,
      workers=[
          './dreambooth-train.sh --steps=800 --images=200 "a photo of dog" "a photo of sks dog"',
      ],
      data=dict(
          datasets=["regularization-data","instance-data"],
          output_type="trainml",
          output_uri="checkpoint",
          output_options=dict(archive=False, save_model=False),
      ),
      model=dict(
          source_type="git",
          source_uri="https://github.com/trainML/stable-diffusion-training-example.git",
          checkpoints=[
            dict(id="stable-diffusion-v2-1-diffuser", public=True),
        ],
      )
    )
    await job.attach()

    checkpoint = await trainml.checkpoints.get(
        job.workers[0].get("output_uuid")
    )
    await checkpoint.wait_for("ready")
    await checkpoint.rename("dog-checkpoint")
    await job.remove()

asyncio.run(create_custom_checkpoint())

Generate a new image:

from trainml import TrainML
import asyncio
import os

trainml = TrainML()

async def generate_image():
  job = await trainml.jobs.create(
      name="DreamBooth Training",
      type="inference",
      gpu_type="rtx3090",
      gpu_count=1,
      disk_size=10,
      workers=[
          'python inference.py --steps=50 --scale=7.5 --prompt "A photo of sks dog in jumping over the moon"',
      ],
      data=dict(
          output_type="local",
          output_uri=os.path.abspath('./output'),
          output_options=dict(archive=False),
      ),
      model=dict(
          source_type="git",
          source_uri="https://github.com/trainML/stable-diffusion-training-example.git",
          checkpoints=[
              "dog-checkpoint",
          ],
      )
  )
  await job.wait_for("waiting for data/model download")
  attach_task = asyncio.create_task(job.attach())
  connect_task = asyncio.create_task(job.connect())
  await asyncio.gather(attach_task, connect_task)
  print(os.listdir(os.path.abspath('./output')))
  await job.remove()


asyncio.run(generate_image())

Prerequisites​

Prepare the Datasets​

Train Custom Checkpoint​

Generate Images with the Custom Checkpoint​

Appendix​

Using the Python SDK​