Overview
Deploying machine learning (ML) models in production environments often requires meticulous planning to ensure smooth operation, high availability, and the ability to handle fluctuating demands. VESSL Service offers two modes to cater to different needs: Provisioned and Serverless.
Provisioned Mode
VESSL Service acts as a robust platform for deploying models developed within VESSL AI, or even your custom models, as inference servers. Provisioned Mode is ideal for those who prefer direct control over their deployment environment with features such as:
- Activity tracking: Monitor logs, system metrics, and model performance metrics.
- Auto-scaling: Automatically adjust server size based on resource usage to handle increased demands.
- Traffic management: Easily split traffic for Canary testing and gradually roll out new model versions without downtime.
- Operational control: Extensive customization through YAML configurations for those who need precise control over their deployments.
What’s next in Provisioned Mode?
VESSL Service Quickstart
Get started with VESSL Service using Llama 3.1-8B and the latest vLLM.
Deploy with YAML
Explore comprehensive YAML configuration examples.
Serverless Mode
Serverless Mode simplifies deployments by abstracting away the underlying server management, allowing you to focus solely on model deployment and scaling. It’s particularly beneficial for teams without deep backend management expertise or those seeking cost-efficiency:
- Automatic scaling: Scale your models in real-time based on workload demands.
- Cost-efficiency: Only pay for the resources you use with a pay-as-you-go pricing model.
- Simplified deployment: Minimal configuration needed, making it accessible regardless of technical background.
- High availability and resilience: Built-in mechanisms to ensure models are always operational and resilient to failures.
What’s next in Serverless Mode?
Enable Serverless Mode
Deploy Serverless mode using Text Generation Inference(TGI)
Deploy with YAML
Explore comprehensive YAML configuration examples.
Both modes of VESSL Service are designed to make the deployment of ML services reliable, adaptable, and capable of managing varying workloads efficiently. Whether you choose the granular control of Provisioned Mode or the streamlined simplicity of Serverless Mode, VESSL Service facilitates the easy rollout and scaling of your AI models.