VESSL AI Documentation

As an organization manager in your firm, you can set custom resource presets under Resource specs that users can select when launching ML workloads. Additionally, you can specify the priority of these options.

For example, when you define resource specs as described above, users will only be able to choose from the three predefined options in Run or Workspace, as shown in the image above. These default options can help admins optimize resource usage by (1) preventing someone from occupying an excessive number of GPUs and (2) preventing unbalanced resource requests that cause skewed resource usage. As a result, average users can simply proceed their jobs without thinking and configuring the exact number of CPU cores and memories they need to request.

Step-by-step guide

Take a quick 2-minute tour of Resource specs using the demo below.

Click New resource spec and define the following parameters.

Name — Set a name for the preset. Use names that well represent the preset like a100-2.mem-16.cpu-6.
Processor type — Define the preset by the processor type, either by CPU or GPU.
CPU limit — Enter the number of CPUs. For a100-2.mem-16.cpu-6, enter 6.
Memory limit — Enter the amount of memory in GB. For a100-2.mem-16.cpu-6, the number would be 16.

Priority - Assigning different priority values disables the First In, First Out (FIFO) scheduler and executes workloads based on their priority, with lower priority values being processed first. In the example preset above, workloads running on cpu-medium are always prioritized over workloads on other GPUs. To view the priority assigned to each node, click the Edit button under Resource Specs.
GPU type — Specify the GPU model you are using by running the nvidia-smi command on your server. In the example below, the GPU type is a100-sxm-80gb.

nvidia-smi

Thu Jan 19 17:44:05 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08    Driver Version: 510.73.08    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:01:00.0 Off |                    0 |
| N/A   40C    P0    64W / 275W |      0MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

GPU limit — Enter the number of GPUs. For gpu2.mem16.cpu6, enter 2. You can also place decimal values if you are using Multi-Instance GPUs (MIG).
Available workloads — Select the type of workloads that can use the preset. With this, you can guide users to use Experiment by preventing them from running Workspace with 4 or 8 GPUs.

Tolerations

Node selectors

Get Started

Compute

Resource

Admin

Private Hub

Pricing

Default resource specs

Step-by-step guide

Operator

Effect

Example use case

Key benefits

Key and value

How node selectors work

Example use case

Key benefits

Get Started

Compute

Resource

Admin

Private Hub

Pricing

​Step-by-step guide

Step-by-step guide