Deploy your AI models efficiently using VESSL Service, which supports both provisioned and serverless deployment methods. This guide will walk you through the steps to create a new service using the Command Line Interface (CLI) and the web console, with how to enable Serverless Mode.
To deploy a service using the CLI, you’ll first need to define your service configuration in a YAML file. This YAML-based configuration allows you to deploy services programmatically.
If you were using this feature as beta and want to migrate to new version, please refer to migration guide.
Deploying through the web console is user-friendly and suitable for those who prefer a graphical interface over command line operations. The interactive demo below will guide you to through the process.
Provisioned Mode — Steps to create a new service in the web console
Initialize this revision with:: Select initialization method.
Template from VESSL hub: Use a template from the VESSL Hub.
Recent revision configuration: Select the configuration of the recent revisions.
YAML file: Upload a YAML file or paste its content to initialize the revision.
Message: Enter a message for the revision.
Resources: Select the compute resources and container image you want to use for the Service.
Resource: Select the compute resources you want to use for the Service.
Container image: The Docker image to use for the Revision.
Task:
Volumes: Import or mount code, data
Command: The command to run inside the container. This is similar to running a command in the terminal on your computer.
Port: The port to expose from the container. For example, if you’re using a BentoML model server, you’ll want to expose port 3000 and use the HTTP protocol to access the service endpoint.
Monitoring: Enable monitoring to track default system metrics from service workers.
Healthcheck: Check API health using the specified port and path.
Autoscaling: Set autoscaling strategy for the revision.
Target Metric: The metric to use for autoscaling - cpu, memory, nvidia.com/GPU, requests.
Target Value: The target value for the metric.
Min value: The minimum number of replicas.
Max value: The maximum number of replicas.
Variables: Environment variables or secret variables to inject into the container.
Serverless Mode — Steps to create a new service in the web console:
Resources: Select the compute resources and container image you want to use for the Service.
Resource: Select the compute resources you want to use for the Service. Custom resource specs cannot be set in Serverless mode. If you need to use a custom resource spec, please contact our sales team.
Container image: The Docker image to use for the Revision.
Task:
Command: The command to run inside the container. This is similar to running a command in the terminal on your computer.
Port: The port to expose from the container. For example, if you’re using a BentoML model server, you’ll want to expose port 3000 and use the HTTP protocol to access the service endpoint. You can open only one port in Serverless mode.
Advanced Settings: Set additional configurations.
Variables: Environment variables or secret variables to inject into the container.