> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vessl.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

Deploying machine learning (ML) models in production environments often requires meticulous planning to ensure smooth operation, high availability, and the ability to handle fluctuating demands. **VESSL Service** offers two modes to cater to different needs: **Provisioned** and **Serverless**.

<img style={{ borderRadius: '0.5rem' }} src="https://mintcdn.com/vesslai/U31B4TuGi_otqUlh/images/service/overview/service.png?fit=max&auto=format&n=U31B4TuGi_otqUlh&q=85&s=48847008c2d80dc4809301635f1c7b65" width="1206" height="743" data-path="images/service/overview/service.png" />

## Provisioned mode

**VESSL Service** acts as a robust platform for deploying models developed within VESSL, or even your custom models, as inference servers. **Provisioned Mode** is ideal for those who prefer direct control over their deployment environment with features such as:

* **Activity tracking**: Monitor logs, system metrics, and model performance metrics.
* **Auto-scaling**: Automatically adjust server size based on resource usage to handle increased demands.
* **Traffic management**: Easily split traffic for Canary testing and gradually roll out new model versions without downtime.
* **Operational control**: Extensive customization through `YAML` configurations for those who need precise control over their deployments.

### What's next in provisioned mode?

<CardGroup cols={2}>
  <Card icon="bolt-lightning" title="VESSL Service Quickstart" href="../../guides/get-started/llama3.1-deployment">
    Get started with **VESSL Service** using Llama 3.1-8B and the latest vLLM.
  </Card>

  <Card icon="book" title="Deploy with YAML" href="../../reference/yaml/serve-yaml">
    Explore comprehensive `YAML` configuration examples.
  </Card>
</CardGroup>

## Serverless mode

**Serverless Mode** simplifies deployments by abstracting away the underlying server management, allowing you to focus solely on model deployment and scaling. It's particularly beneficial for teams without deep backend management expertise or those seeking cost-efficiency:

* **Automatic scaling**: Scale your models in real-time based on workload demands.
* **Cost-efficiency**: Only pay for the resources you use with a pay-as-you-go pricing model.
* **Simplified deployment**: Minimal configuration needed, making it accessible regardless of technical background.
* **High availability and resilience**: Built-in mechanisms to ensure models are always operational and resilient to failures.

### What's next in serverless mode?

<CardGroup cols={2}>
  <Card icon="bolt-lightning" title="Enable Serverless Mode" href="../../guides/get-started/serverless-tgi-deployment">
    Deploy **Serverless mode** using Text Generation Inference(TGI)
  </Card>

  <Card icon="book" title="Deploy with YAML" href="../../reference/yaml/serve-yaml">
    Explore comprehensive `YAML` configuration examples.
  </Card>
</CardGroup>

Both modes of **VESSL Service** are designed to make the deployment of ML services reliable, adaptable, and capable of managing varying workloads efficiently. Whether you choose the granular control of **Provisioned Mode** or the streamlined simplicity of **Serverless Mode**, **VESSL Service** facilitates the easy rollout and scaling of your AI models.
