This example deploys a simple web app for Stable Diffusion. You will learn how you can set up an interactive workload for inference — mounting models from Hugging Face and opening up a port for user inputs. For a more in-depth guide, refer to our blog post.

Note that if you want to save your credits, remember to “Terminate” to stop and end the runs.

What you will do

  • Host a GPU-accelerated web app built with Streamlit
  • Mount model checkpoints from Hugging Face
  • Open up a port to an interactive workload for inference

Writing the YAML

Let’s fill in the stable-diffusion.yaml file.

1

Spin up an interactive workload

We already learned how you can launch an interactive workload in our previous guide. Let’s copy & paste the YAML we wrote for notebook.yaml.

name: Stable Diffusion Playground
description: An interactive web app for Stable Diffusion
resources:
  cluster: vessl-gcp-oregon
  preset: gpu-l4-small
image: quay.io/vessl-ai/torch:2.1.0-cuda12.2-r3
interactive:
  jupyter:
    idle_timeout: 120m
  max_runtime: 24h
2

Configure an interactive run

Let’s mount a GitHub repo and import a model checkpoint from Hugging Face. We already learned how you can mount a codebase from our Quickstart guide.

VESSL AI comes with a native integration with Hugging Face so you can import models and datasets simply by referencing the link to the Hugging Face repository. Under import, let’s create a working directory /model/ and import the model.

name: Stable Diffusion Playground
description: An interactive web app for Stable Diffusion
resources:
  cluster: vessl-gcp-oregon
  preset: gpu-l4-small
image: quay.io/vessl-ai/torch:2.1.0-cuda12.2-r3
	import:
		/code/:
			git:
				url: https://github.com/vessl-ai/hub-model
				ref: main
		/model/: hf://huggingface.co/VESSL/SSD-1B
interactive:
  jupyter:
    idle_timeout: 120m
  max_runtime: 24h
3

Open up a port for inference

The ports key expose the workload ports where VESSL AI listens for HTTP requests. This means you will be able to interact with the remote workload — sending input query and receiving an generated image through port 80 in this case.

name: Stable Diffusion Playground
description: An interactive web app for Stable Diffusion
resources:
  cluster: vessl-gcp-oregon
  preset: gpu-l4-small
image: quay.io/vessl-ai/torch:2.1.0-cuda12.2-r3
	import:
		/code/:
			git:
				url: https://github.com/vessl-ai/hub-model
				ref: main
		/model/: hf://huggingface.co/VESSL/SSD-1B
interactive:
  jupyter:
    idle_timeout: 120m
  max_runtime: 24h
ports:
  - name: streamlit
    type: http
    port: 80
4

Write the run commands

Let’s install additional Python dependencies with requirements.txt and finally run our app ssd_1b_streamlit.py.

Here, we see how our Streamlit app is using the port we created previously with the --server.port=80 flag. Through the port, the app receives a user input and generates an image with the Hugging Face model we mounted on /model/.

name: Stable Diffusion Playground
description: An interactive web app for Stable Diffusion
resources:
  cluster: vessl-gcp-oregon
  preset: gpu-l4-small
image: quay.io/vessl-ai/torch:2.1.0-cuda12.2-r3
import:
  /code/:
    git:
      url: https://github.com/vessl-ai/hub-model
      ref: main
  /model/: hf://huggingface.co/VESSL/SSD-1B
run:
  - command: |-
      pip install -r requirements.txt
      streamlit run ssd_1b_streamlit.py --server.port=80
    workdir: /code/SSD-1B
interactive:
  max_runtime: 24h
  jupyter:
    idle_timeout: 120m
ports:
  - name: streamlit
    type: http
    port: 80

Running the app

Once again, running the workload will guide you to the workload Summary page.

vessl run create -f stable-diffusion.yaml

Under ENDPOINTS, click the streamlit link to launch the app.

Using our web interface

You can repeat the same process on the web. Head over to your Organization, select a project, and create a New run.

What’s next?

See how VESSL AI takes care of the infrastructural challenges of fine-tuning a large language model with a custom dataset.