- VESSL API Server — Enables communication between the user and the GPU clusters, through which users can launch containerized ML workloads.
- VESSL Cluster Agent — Sends information about the clusters and workloads running on the cluster such as the node specifications and model metrics.
- Control plane node — Acts as the 🔗 cluster-wide control tower and orchestrates subsidiary worker nodes.
- Worker nodes — Run specified ML workloads based on the runtime spec and environment received from the control plane node.
Integrating more powerful, multi-node GPU clusters for your team is as easy as integrating your personal laptop. To make the process easier, we’ve prepared a single-line curl command that installs all the binaries and dependencies on your server.
Step-by-step Guide
There is an ongoing 🔗 issue related to Kubernetes hostname containing capital letters. Please make sure your machine’s hostname is in lowercase.
(1) Prerequisites
Note that Ubuntu 18.04 or CentOS 7.9 or higher Linux OS is installed on your server.Install dependencies
You can install all the dependencies required for cluster integration using a single-linecurl command. The command
- Installs 🔗 Docker if it’s not already installed.
- Installs and configures 🔗 NVIDIA container runtime.
- Installs 🔗 k0s, a lightweight Kubernetes distribution, and designates and configures a control plane node.
- Generates a token and a command for connecting worker nodes to the control plane node configured above.
--taint-controller flag at the end of the command.
Upon installing all the dependencies, the command returns a follow-up command with a token. You can use this to add worker nodes to the control plane. If you don’t want to add an additional worker node you can skip to the next step.
k0s command.
(2) VESSL integration
You are now ready to integrate the Kubernetes cluster with VESSL. Make sure you have VESSL Client installed on the server and configured for your organization.Enter to use the default values.
By this point, you have successfully completed the integration.
(3) Confirm integration
You can use VESSL CLI command or visit 🗂️ Clusters to confirm your integration.
Common troubleshooting commands
Here are common problems that our users face as they integrate on-premises Clusters. You can use thejournalctl command to get a more detailed log of the issue. Please share this log as you reach out for support.
VesslApiException: PermissionDenied (403): Permission denied.
VesslApiException: NotFound (404) Requested entity not found.
sudo command:
Changing your hostname may have unexpected side effects, and might be prohibited depending on your organization’s IT policy.

