Deploying a server to host ML models in a production environment requires careful planning to ensure they run smoothly, stay available, and can handle increased demands. This can be particularly challenging for ML engineers or small backend teams who might not be deeply familiar with complex backend setups.
VESSL Serve is an essential tool for deploying models developed in VESSL, or even your custom models as inference servers. VESSL Serve not only makes inference easy but also offers features like:
  • Keeping track of activities like logs, system metrics and model performance metrics
  • Automatically adjusting their size based on demand (resource usage)
  • Split the traffic sent to models for easier Canary testing
  • Roll out a new version of model to production without downtime
VESSL Serve simplifies the process of setting up ML services that are reliable, adaptable, and can handle varying workloads.


Follow along our quickstart guide to get started with VESSL Serve.

Use VESSL Serve on Web Console

Navigate to the [Servings] section found in the global navigation bar. You will find an inventory of all the servings you have previously created.
Servings section on the global navigation bar
For an in-depth guide on creating a new serving, refer to the [Serving Web Workflow] section.

Use VESSL Serve with YAML interface

Alternatively, you have the option to declaratively define serving instances using YAML manifests. This approach enables you to exercise version control, such as Git, for managing versions and configuration settings of these serving instances.
message: VESSL Serve example
name: v1.cpu-2.mem-6
run: vessl model serve mnist-example 19 --install-reqs
min: 1
max: 3
metric: cpu
target: 60
- port: 8000
name: fastapi
type: http
For more details on how to create a serving, see [Serving YAML workflow] section.

Use VESSL Serve with Python SDK

See VESSL Serve section in API Reference for more details.