SDK-driven Workflow
The document below covers the process of creating an image classification model with the MNIST dataset using the VESSL client SDK. Once again, we will
To follow this guide, you should first have the following setup.
If you have not created an Organization or a Project, first follow the instructions on the end-to-end guides.
Let's start by configuring the client with the default organization and project we have created earlier. This is done by executing
vessl.configure()
.import vessl
organization_name = "YOUR_ORGANIZATION_NAME"
project_name = "YOUR_PROJECT_NAME"
vessl.configure(
organization_name=organization_name,
project_name=project_name
)
To create a dataset on VESSL, run
vessl.create_dataset()
. Let's create a dataset from the public AWS S3 dataset we have prepared: s3://savvihub-public-apne2/mnist
. You can check that your dataset was created successfully by executing the dataset's variable name.dataset = vessl.create_dataset(
dataset_name="vessl-mnist",
is_public=True,
external_path="s3://savvihub-public-apne2/mnist"
)
dataset
To create an experiment, use
vessl.create_experiment()
. Let's run an experiment using VESSL's managed clusters. First, specify the cluster and resource options. Then, specify the image URL — in this case, we are pulling a Docker image from VESSL's Amazon ECR Public Gallery. Next, we are going to mount the dataset we have created previously. Finally, let's specify the start command that will be executed in the experiment container. Here, we will use the MNIST Keras example from our GitHub repository.github_repo = "https://github.com/vessl-ai/examples.git"
experiment = vessl.create_experiment(
cluster_name="aws-apne2-prod1",
kernel_resource_spec_name="v1.cpu-4.mem-13",
kernel_image_url="public.ecr.aws/vessl/kernels:py36.full-cpu",
dataset_mounts=[f"/input:{dataset.name}"],
start_command=f"git clone {github_repo} && pip install -r examples/mnist/keras/requirements.txt && python examples/mnist/keras/main.py --save-model --save-image",
)
Note that you can also integrate a GitHub repository with your project so you don't have to
git clone
every time you create an experiment. For more information about these features, please refer to the project repository & project dataset page.The experiment may take a few minutes to complete. You can get the details of the experiment, including its status, by using
vessl.read_experiment()
.experiment = vessl.read_experiment(
experiment_number=experiment.name
)
The metrics summary of the experiment is stored as a Python dictionary. You can check the latest metrics using
metrics_summary.latest
as follows.experiment.metrics_summary.latest["accuracy"].value
In VESSL, you can create a model from a completed experiment. First, let's start by creating a model repository using
vessl.create_model_repository()
and specifying the repository name.model_repository = vessl.create_model_repository(
name="tutorial-mnist",
)
Then, run
vessl.create_model()
with the name and ID of the destination repository and experiment we just created.model = vessl.create_model(
repository_name=model_repository.name,
experiment_id=experiment.id,
model_name="v0.0.1",
)
So far, we ran a single machine learning experiment and saved it as a model inside a model repository. In this section, we will use a sweep to find the optimal hyperparameter value.
First, configure
sweep_objective
with the target metric name and target value. Note that the metric must be a logged to VESSL using vessl.log()
.sweep_objective = vessl.SweepObjective(
type="maximize", # target object (either to minimize or maximize the metric)
goal="0.99", # target metric name as defined and logged using `vessl.log()`
metric="val_accuracy", # target metric value
)
Next, define the search space of
parameters
. In this example, the optimizer
is a categorical
type and the option values are listed as an array. The batch_size
is an int value and the search space
is set using max, min, and step.parameters = [
vessl.SweepParameter(
name="optimizer",
type="categorical", # int, double, categorical
range=vessl.SweepParameterRange(
list=["adam", "sgd", "adadelta"]
)
),
vessl.SweepParameter(
name="batch_size",
type="int", # int, double, categorical
range=vessl.SweepParameterRange(
max="256",
min="64",
step="8",
)
)
]
Initiate hyperparameter searching using
vessl.create_sweep()
. You can see in the code below that the options for cluster, resource, image, dataset, and command options has been set similar to the vessl experiment create
explained above.sweep = vessl.create_sweep(
objective=sweep_objective,
max_experiment_count=4,
parallel_experiment_count=2,
max_failed_experiment_count=2,
algorithm="random", # grid, random, bayesian
parameters=parameters,
dataset_mounts=[f"/input:{dataset.name}"],
cluster_name=experiment.kernel_cluster.name, # same as the experiment
kernel_resource_spec_name=experiment.kernel_resource_spec.name, # same as the experiment
kernel_image_url=experiment.kernel_image.image_url, # same as the experiment
start_command=experiment.start_command, # same as the experiment
)
You can get the details of the sweep by calling the variable or by visiting the web console.
Now that we have run several experiments using sweep, let's find the optimal experiment.
vessl.get_best_sweep_experiment()
returns the experiment information with the best metric value set in sweep_objective
. In this example, this will return the details of the experiment with the maximum val_accuracy
.best_experiment = vessl.get_best_sweep_experiment(sweep_name=sweep.name)
best_experiment = vessl.read_experiment(experiment_number=experiment.name)
model = vessl.create_model(
repository_name="tutorial-mnist",
experiment_id=best_experiment.id,
model_name="v0.0.2",
)
You can view the performance of your model by using
vessl.read_model()
and specifying the model repository followed by the model number.vessl.read_model(
repository_name="tutorial-mnist",
model_number="2",
)
We have looked at the overall workflow of using the VESSL Client SDK. We can also repeat the same process using the client CLI or through Web UI. Now, try this guide with your own code and dataset.
Last modified 1yr ago