Deploy your AI models efficiently using VESSL Serve, which supports both provisioned and serverless deployment methods. This guide will walk you through the steps to create a new service using the Command Line Interface (CLI) and the Web Console, with an emphasis on enabling serverless mode.

Deploy a New Service using CLI

To deploy a service using the CLI, you’ll first need to define your service configuration in a YAML file. This YAML-based configuration allows you to deploy services programmatically.

If you were using this feature as beta and want to migrate to new version, please refer to migration guide.

Example YAML Configuration:

name: vessl-test-service
message: vessl-yaml-revision
image: quay.io/vessl-ai/torch:2.1.0-cuda12.2-r3
resources:
  cluster: vessl-gcp-oregon
  preset: gpu-l4-small-spot
run:
  - command: |
      pip install vllm
      python -m vllm.entrypoints.openai.api_server --model $MODEL_NAME --max-model-len 4096 --disable-log-requests --api-key token-123
    workdir: /code/serve-quickstart
env:
  MODEL_NAME: mistralai/Mistral-7B-Instruct-v0.2
ports:
  - port: 8000
service:
  autoscaling:
    max: 2
    metric: cpu
    min: 1
    target: 50
  expose: 8000

Steps to Deploy Using CLI:

  1. Create or edit your YAML file to define the service configuration.
  2. Deploy the service by running the following command in your terminal. Replace [your-yaml-file].yaml with the path to your YAML file.:
    vessl serve create -f [your-yaml-file].yaml
    
    If you want to deploy a serverless mode, make sure to append --serverless flag.
    vessl serve create -f [your-serverless-yaml-file.yaml] --serverless
    

Deploy a New Service using Web Console

Deploying via the Web Console is user-friendly and suitable for those who prefer a graphical interface over command line operations.

Steps to Create a New Service in the Web Console:

  1. Navigate to the Service in the Web Console:

  2. Click “New Service” to start the configuration process.

  3. Configure your service:

    Serverless Mode is only available in VESSL-managed cloud clusters.

    - Name: Enter a name for your service. - Example: mixtral-7b.chatbot - Description: Provide an optional descriptive field for the purpose of your service. - Example: This service hosts a chatbot model for customer support.

    • Cluster: Select the physical location where the service’s resources are hosted. - Serverless: Toggle the serverless option to enable it. - VESSL AI offers serverless GPU computing for AI, with pay-per-hour billing.
  4. Create a new service revision:

You can easily create a new Revision using the Web UI, or you can create your revision from a YAML file.

  • Resources: Select the compute resources and container image you want to use for the Service.
    • Resource: Select the compute resources you want to use for the Service. Custom resource specs cannot be set in Serverless mode. If you need to use a custom resource spec, please contact our sales team.
    • Container image: The Docker image to use for the Revision.
  • Task:
    • Command: The command to run inside the container. This is similar to running a command in the terminal on your computer.
    • Port: The port to expose from the container. For example, if you’re using a BentoML model server, you’ll want to expose port 3000 and use the HTTP protocol to access the service endpoint. You can open only one port in Serverless mode.
  • Advanced Settings: Set additional configurations.
    • Variables: Environment variables or secret variables to inject into the container.