> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vessl.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# On-premises clusters

## Introduction

Do you have knowledge of Kubernetes and DevOps, and available GPU resources, but find it challenging to manage these resources?
If so, you can install and manage VESSL directly on your resources. By using scripts provided by VESSL AI, you can register clusters and utilize VESSL's powerful MLOps features on your computing resources.

## Before you begin

Ensure you have the following prerequisites. This guide provides generic instructions for Linux distributions based on Ubuntu 22.04.

<AccordionGroup>
  <Accordion title="Operating System">
    * Ubuntu 20.04, CentOS 7.9 or above.
  </Accordion>

  <Accordion title="Hardware Requirements">
    * At least 2 GB of RAM per machine.
    * At least 2 CPUs per machine.
    * At least 500 GB of storage or more.
  </Accordion>

  <Accordion title="Software Requirements">
    * Python 3.8 or above is installed.
    * [Helm](https://helm.sh/docs/intro/install/) is installed.
  </Accordion>

  <Accordion title="Network Requirements">
    * Full network connectivity between all machines in the cluster (public or private network).
    * Unique hostname, MAC address, and product\_uuid for every node. See [here](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#verify-mac-address) for more details.
    * Ensure the machine’s hostname is in lowercase.
    * Certain ports must be open on your machines. See [here](https://docs.k0sproject.io/stable/networking/#required-ports-and-protocols) for more details.
    * We recommend that all nodes have static IP addresses, regardless of whether they are on a public or private network.
  </Accordion>

  <Accordion title="VESSL Accounts and Organization">
    * Make sure you have active VESSL accounts and organization setup.
    * Your VESSL account must have admin privileges in the organization.
  </Accordion>
</AccordionGroup>

## Step-by-step Guide

### Install GPU-relative Components

If your nodes have NVIDIA GPUs, you need to install the following programs on all GPU nodes that you want to connect to the cluster.
If a node does not have a GPU, you can skip this process.

<Steps>
  <Step title="NVIDIA Graphics Driver">
    Follow [this document](https://ubuntu.com/server/docs/nvidia-drivers-installation) to install the latest version of the NVIDIA graphics driver on your node.
  </Step>

  <Step title="NVIDIA Container Runtime">
    Follow [this document](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-apt) to install the latest version of the nvidia-container-runtime on your node.
  </Step>

  <Step title="NVIDIA CUDA Toolkit">
    ```bash theme={null}
    sudo apt-get install nvidia-cuda-toolkit
    ```
  </Step>
</Steps>

### Setup Control Plane Node

<Steps>
  <Step title="Download bootstrap script">
    First, download the script on the node you want to use to manage your cluster (referred to as the [control plane](https://kubernetes.io/docs/concepts/overview/components/#control-plane-components) in Kubernetes).

    ```bash theme={null}
    curl -sSLf https://install.vessl.ai/bootstrap-cluster/bootstrap-cluster.sh > bootstrap-cluster.sh
    chmod +x bootstrap-cluster.sh
    ```

    This script includes everything needed to install k0s and its dependencies for setting up an on-premises cluster. If you are familiar with k0s and bash scripts, you can modify the script to suit your desired configuration.
  </Step>

  <Step title="Execute bootstrap script">
    To designate this node as the control plane and proceed with the Kubernetes cluster installation, run the following command:

    ```bash theme={null}
    ./bootstrap-cluster.sh --role=controller
    ```
  </Step>

  <Step title="Copy and paste token">
    After completing the installation, you will receive a token similar to the one shown in the screenshot. Copy and save this token.

    <Frame>
      <img src="https://mintcdn.com/vesslai/p4Iy0AO-LrmBbPuL/images/clusters/onprem/token.png?fit=max&auto=format&n=p4Iy0AO-LrmBbPuL&q=85&s=b73e126adad164e333282b8c066ae2da" width="851" height="515" data-path="images/clusters/onprem/token.png" />
    </Frame>
  </Step>
</Steps>

#### Advanced: Taint Control Plane Node

By default, VESSL's workloads are also deployed on the control plane. However, if you do not need to allocate Machine Learning workloads to the control plane node due to resource constraints or for ease of management, you can prevent workload allocation by using the following command during installation:

```bash theme={null}
./bootstrap-cluster.sh --role=controller --taint-controller
```

#### Advanced: Select a Specific Kubernetes version

```bash theme={null}
./bootstrap-cluster.sh --role=controller --k0s-version=v1.30.1+k0s.0
```

<Warning>It is recommended not to downgrade to a Kubernetes version earlier than 1.24 unless you are well-informed about the implications.</Warning>

There are additional script options available. Use the --help option to see all configurable parameters.

```bash theme={null}
./bootstrap-cluster.sh --help
```

### Verify Control Plane Setup

To verify that the installation was successful and that the pods are running correctly, enter the following commands:

<Steps>
  <Step title="Check Node Configuration">
    It may take some time for the nodes and pods to be properly deployed. After a short wait, check the node configuration by entering:

    ```bash theme={null}
    sudo k0s kubectl get nodes
    ```

    <Frame>
      <img src="https://mintcdn.com/vesslai/p4Iy0AO-LrmBbPuL/images/clusters/onprem/node.png?fit=max&auto=format&n=p4Iy0AO-LrmBbPuL&q=85&s=94eed142c9987d69d755bcce4298a480" width="637" height="39" data-path="images/clusters/onprem/node.png" />
    </Frame>
  </Step>

  <Step title="Check Pod Status">
    To check the status of all pods across all namespaces, use the following command:

    ```bash theme={null}
    sudo k0s kubectl get pods -A
    ```

    <Frame>
      <img src="https://mintcdn.com/vesslai/p4Iy0AO-LrmBbPuL/images/clusters/onprem/pod.png?fit=max&auto=format&n=p4Iy0AO-LrmBbPuL&q=85&s=46e2a9c4eb470927236bb2aed589e515" width="637" height="124" data-path="images/clusters/onprem/pod.png" />
    </Frame>
  </Step>
</Steps>

These commands will help you ensure that the nodes and pods are correctly configured and running as expected.

### Setup Worker Node

After completing the Control Plane setup, you can configure the worker nodes using the command issued during the Control Plane setup. Execute the following command on the worker node you want to connect:

```bash theme={null}
curl -sSLf https://install.vessl.ai/bootstrap-cluster/bootstrap-cluster.sh | sudo bash -s -- --role=worker --token="[TOKEN_FROM_CONTROLLER_HERE]"
```

Replace `[TOKEN_FROM_CONTROLLER_HERE]` with the actual token you received from the Control Plane setup.

### Create VESSL Cluster

The VESSL cluster setup is performed on the Control Plane node.

<Steps>
  <Step title="Install VESSL CLI">
    If the VESSL CLI is not already installed, use the following command to install it:

    ```bash theme={null}
    pip install vessl --upgrade
    ```
  </Step>

  <Step title="Configure VESSL CLI">
    After installation, configure the VESSL CLI:

    ```bash theme={null}
    vessl configure
    ```
  </Step>

  <Step title="Create a New VESSL Cluster">
    Once the VESSL CLI is configured, create a new VESSL Cluster with the following command:

    ```bash theme={null}
    vessl cluster create
    ```

    Follow the prompts to configure your cluster options. You can press Enter to use the default values. If cluster is created, you will see a success message.

    <Frame>
      <img src="https://mintcdn.com/vesslai/p4Iy0AO-LrmBbPuL/images/clusters/onprem/cli-cluster-create.png?fit=max&auto=format&n=p4Iy0AO-LrmBbPuL&q=85&s=ef6b06d76116ff10980f5485095fded6" width="943" height="329" data-path="images/clusters/onprem/cli-cluster-create.png" />
    </Frame>
  </Step>
</Steps>

### Confirm VESSL Cluster integration

To verify that the VESSL Cluster is properly integrated with your organization, run the following command:

```bash theme={null}
vessl cluster list
```

Additionally, you can check the status of the integrated cluster by navigating to the Web UI. Go to the Organization tab and then select the Cluster tab. You should see the current status of the connected cluster displayed.

<Frame>
  <img src="https://mintcdn.com/vesslai/p4Iy0AO-LrmBbPuL/images/clusters/onprem/webui-cluster.png?fit=max&auto=format&n=p4Iy0AO-LrmBbPuL&q=85&s=35ab99967af19429320f51ec6a10b95f" width="1168" height="432" data-path="images/clusters/onprem/webui-cluster.png" />
</Frame>

## Limitation

Some features of VESSL are not available or guaranteed to work on your on-premises cluster. The following functionalities have limitations:

### VESSL Run

1. Custom Resource Specs in YAML
   * You cannot set custom resource specifications directly using YAML.
   * For example, setting CPU, GPU, or memory directly in YAML is not supported.
   * To use YAML for your VESSL Run, you must create a [Resource Spec](/guides/clusters/specs) for your on-premises cluster.

### VESSL Service

1. Provisioned Mode
   * You cannot create a VESSL Service in Provisioned mode on your on-premises cluster.

## Frequently Asked questions

### How can VESSL support on-premises clusters?

VESSL provides a set of scripts that help you install and manage Kubernetes clusters using [k0s](https://k0sproject.io/) on your on-premises resources. By using these scripts, you can register clusters and utilize VESSL's powerful MLOps features on your computing resources.

### How can I get token from the control plane node again?

You can create a new token by running the following command on the control plane node:

```bash theme={null}
sudo k0s token create
```

### Do we need CRI (docker or containerd) for Kubernetes?

k0s contains the required CRI for the Kubernetes cluster, so you don't need to install it directly. For more information, you can find it [here](https://docs.k0sproject.io/stable/runtime/).

### Bootstrap script failed when setting the node

If the bootstrap script fails during node setup, verify the network configuration and ensure all prerequisites are met. Check the logs for specific error messages to diagnose the issue.

### Bootstrap script succeeded, but k8s pods are failed

If the bootstrap script succeeds but Kubernetes pods fail, check the pod logs for errors. Common issues include network misconfigurations, insufficient resources, or missing dependencies.

### How can I uninstall k0s?

To uninstall k0s, follow these steps:

<Steps>
  <Step title="Stop and Reset k0s">
    ```bash theme={null}
    sudo k0s stop
    sudo k0s reset
    ```
  </Step>

  <Step title="Reboot the instance">
    ```bash theme={null}
    sudo reboot
    ```
  </Step>

  <Step title="Manual Removal (if needed)">
    If the k0s stop or k0s reset command hangs, manually remove the k0s components:

    ```bash theme={null}
    systemctl stop k0scontroller                  # k0sworker for worker nodes
    systemctl disable k0scontroller               # k0sworker for worker nodes
    systemctl daemon-reload
    systemctl reset-failed

    rm /etc/systemd/system/k0scontroller.service  # k0sworker.service for worker nodes
    rm -rf /run/k0s
    rm -rf /var/lib/k0s
    rm -rf /opt/vessl/k0s

    rm /etc/k0s/containerd.toml
    touch /etc/k0s/containerd.toml

    rm /usr/local/bin/k0s
    ip link delete vxlan.calico
    ```
  </Step>
</Steps>

### How can I allocate static IP to nodes?

To allocate static IPs to nodes, modify the bootstrap script as follows:

<Steps>
  <Step title="Download the bootstrap script">
    ```bash theme={null}
    curl -sSLf https://install.vessl.ai/bootstrap-cluster/bootstrap-cluster.sh > bootstrap-cluster.sh
    chmod +x bootstrap-cluster.sh
    ```
  </Step>

  <Step title="Modify the script">
    In the run\_k0s\_controller\_daemon() function, add `--enable-k0s-cloud-provider=true`:

    ```bash theme={null}
    sudo $K0S_EXECUTABLE install controller -c $K0S_CONFIG_PATH/k0s.yaml \
        ${no_taint_option:+"--no-taints"} \
        --enable-worker \
        --enable-k0s-cloud-provider=true \    ## Add argument here
        "$CRI_SOCKET_OPTION" \
        "$KUBELET_EXTRA_ARGS"
    ```

    In the run\_k0s\_worker\_daemon() function, add `--enable-cloud-provider=true`:

    ```bash theme={null}
    sudo $K0S_EXECUTABLE install worker -c $K0S_CONFIG_PATH/k0s.yaml \
        --enable-cloud-provider=true \    ## Add argument here
        "$CRI_SOCKET_OPTION" \
        "$KUBELET_EXTRA_ARGS"
    ```
  </Step>

  <Step title="Run the modified bootstrap script">
    Execute the script on all controller nodes and worker nodes.
  </Step>

  <Step title="Annotate nodes with static IP">
    On the control plane node, run the following command:

    ```bash theme={null}
    k0s kubectl annotate \
    node <node> \
    k0sproject.io/node-ip-external=<external IP>
    ```
  </Step>
</Steps>

For more detailed information, refer to the [k0s documentation](https://docs.k0sproject.io/head/cloud-providers/#deploy-the-cloud-provider).

### How can I remove the VESSL cluster from the organization?

If you need to delete the on-premises cluster due to issues or because it is no longer needed, follow these steps:

<Steps>
  <Step title="Stop k0s services on all nodes">
    ```bash theme={null}
    sudo k0s stop
    sudo k0s reset
    sudo reboot
    ```
  </Step>

  <Step title="Delete the cluster from the Web UI">
    Navigate to the Web UI and delete the created cluster from the organization.

    <Frame>
      <img src="https://mintcdn.com/vesslai/p4Iy0AO-LrmBbPuL/images/clusters/onprem/delete-cluster.png?fit=max&auto=format&n=p4Iy0AO-LrmBbPuL&q=85&s=f470379eb4727952e666214a970df7cd" width="1121" height="626" data-path="images/clusters/onprem/delete-cluster.png" />
    </Frame>
  </Step>

  <Step title="Detele the cluster from the CLI (if needed)">
    You can also delete the cluster using the CLI. Execute the following command:

    ```bash theme={null}
    vessl cluster delete <cluster-name>
    ```

    Replace cluster-name with the name of the cluster you wish to delete.
  </Step>
</Steps>

### The Network Interface has changed. What do we do?

If your network interface or IP address changes, you need to reset and reconfigure k0s.

<Steps>
  <Step title="Reset k0s">
    ```bash theme={null}
    sudo k0s stop
    sudo k0s reset
    ```
  </Step>

  <Step title="Reconfigure k0s">
    If the control plane's network interface changes, you must reconfigure the control plane and all worker nodes.
  </Step>

  <Step title="Reboot the instance">
    ```bash theme={null}
        sudo reboot
    ```
  </Step>
</Steps>

After resetting, follow the setup instructions again to re-establish the network configuration for both control plane and worker nodes.

## Troubleshooting

### VESSL Flare

If you encounter issues while setting up the on-premises cluster or while using an already set up on-premises cluster, you can get assistance from VESSL Flare.
Click the link below to learn how to use VESSL Flare:

<Card title="VESSL Flare" icon="screwdriver-wrench" href="/guides/clusters/troubleshoot">
  Collects all of the node’s configuration and writes them to an archive file.
</Card>

### Support

For additional support, you have the following options:

* General Support: Use Hubspot or send an email to [support@vessl.ai](mailto:support@vessl.ai).
* Professional Support: If you require professional support, contact [sales@vessl.ai](mailto:sales@vessl.ai) for a dedicated support channel.
