Your custom cluster can be either on-premise or on-cloud. For on-premise clusters, you can specify the processor type and resource requirements. The experiment will be assigned automatically to an available node based on the input resource requirements.
Distribution Mode (Optional)
You have an option to use multi-node distributed training. The default option is single-node training.
Select the Docker image that the experiment container will use. You can either use a managed image provided by VESSL or your own custom image.
Managed images are pre-pulled images provided by VESSL. You can find the available image tags at VESSL's Amazon ECR Public Gallery.
Then, check the private image checkbox, fill in the image URL, and select the credential.
Start Command (Required)
Specify the start command in the experiment container. Write a running script with command-line arguments just as you are using a terminal. You can put multiple commands by using the && command or a new line separation.
You can mount the project, dataset, and files to the experiment container.
Learn more about volume mount on the following page:
You can set hyperparameters as key-value pairs. The given hyperparameters are automatically added to the container as environment variables with the given key and value. A typical experiment will include hyperparameters like learning_rate and optimizer.
You can also use them at runtime by appending them to the start command as follows.
python main.py \
Checking the termination protection option puts experiments in idle once it completes running, so you to access the container of a finished experiment.