Deploying a server to host ML models in a production environment requires careful planning to ensure they run smoothly, stay available, and can handle increased demands. This can be particularly challenging for ML engineers or small backend teams who might not be deeply familiar with complex backend setups.
VESSL Serve is an essential tool for deploying models developed in VESSL, or even your custom models as inference servers. VESSL Serve not only makes inference easy but also offers features like:
- Keeping track of activities like logs, system metrics and model performance metrics
- Automatically adjusting their size based on demand (resource usage)
- Split the traffic sent to models for easier Canary testing
- Roll out a new version of model to production without downtime
VESSL Serve simplifies the process of setting up ML services that are reliable, adaptable, and can handle varying workloads.
Follow along our quickstart guide to get started with VESSL Serve.
Navigate to the [Servings] section found in the global navigation bar. You will find an inventory of all the servings you have previously created.
Servings section on the global navigation bar
For an in-depth guide on creating a new serving, refer to the [Serving Web Workflow] section.
Alternatively, you have the option to declaratively define serving instances using YAML manifests. This approach enables you to exercise version control, such as Git, for managing versions and configuration settings of these serving instances.
message: VESSL Serve example
run: vessl model serve mnist-example 19 --install-reqs
- port: 8000
For more details on how to create a serving, see [Serving YAML workflow] section.