Creating an Experiment
To create an experiment, first specify a few options such as cluster, resource, image, and start command. Here is an explanation of the config options.
You can run your experiment on either VESSL's managed cluster or your custom cluster. Start by selecting a cluster.
Once you selected VESSL's managed cluster, you can view a list of available resources under the dropdown menu.
You also have an option to use spot instances.
Check out the full list of resource types and corresponding prices:
Your custom cluster can be either on-premise or on-cloud. For on-premise clusters, you can specify the processor type and resource requirements. The experiment will be assigned automatically to an available node based on the input resource requirements.
You have an option to use multi-node distributed training. The default option is single-node training.
Select the Docker image that the experiment container will use. You can either use a managed image provided by VESSL or your own custom image.
Managed images are pre-pulled images provided by VESSL. You can find the available image tags at VESSL's Amazon ECR Public Gallery.
To pull images from the public Docker registry, simply pass the image URL. The example below demonstrates pulling the official TensorFlow development GPU image from Docker Hub.
To pull images from the private Docker registry, you should first integrate your credentials in organization settings.
Then, check the private image checkbox, fill in the image URL, and select the credential.
Specify the start command in the experiment container. Write a running script with command-line arguments just as you are using a terminal. You can put multiple commands by using the
&&command or a new line separation.
You can mount the project, dataset, and files to the experiment container.
Learn more about volume mount on the following page:
You can set hyperparameters as key-value pairs. The given hyperparameters are automatically added to the container as environment variables with the given key and value. A typical experiment will include hyperparameters like
You can also use them at runtime by appending them to the start command as follows.
python main.py \
--learning-rate $learning_rate \
Checking the termination protection option puts experiments in idle once it completes running, so you to access the container of a finished experiment.