Deploying a Model (Model Serving)
You can use VESSL to quickly deploy your models into production for use from external applications via APIs. You can register it via the SDK and deploy it in the Web UI in one click.

Register a model using the SDK

A model file cannot be deployed on its own - we need to provide instructions on how to setup the server, handle requests, and send responses. This step is called registering a model.
There are two ways you can register a model. One is to use an existing model - that is, a VESSL model exists and a model file is stored in it. The other is to train a model from scratch and register it. The two options are further explained below.

1. Register an existing model

In most cases, you will have already trained model and have the file ready, either through VESSL's experiment or in your local environment. After creating a model, you will need to register it using the SDK. The below example shows how you can do so.
1
import torch
2
import torch.nn as nn
3
from io import BytesIO
4
​
5
import vessl
6
​
7
class Net(nn.Module):
8
# Define model
9
​
10
class MyRunner(vessl.RunnerBase):
11
@staticmethod
12
def load_model(props, artifacts):
13
model = Net()
14
model.load_state_dict(torch.load("model.pt"))
15
model.eval()
16
return model
17
​
18
@staticmethod
19
def preprocess_data(data):
20
return torch.load(BytesIO(data))
21
​
22
@staticmethod
23
def predict(model, data):
24
with torch.no_grad():
25
return model(data).argmax(dim=1, keepdim=False)
26
​
27
@staticmethod
28
def postprocess_data(data):
29
return {"result": data.item()}
30
​
31
vessl.configure()
32
vessl.register_model(
33
repository_name="my-repository",
34
model_number=1,
35
runner_cls=MyRunner,
36
requirements=["torch"],
37
)
Copied!
First, we redefine the layers of the torch model. (This is assuming we only saved the state_dict, or the model's parameters. If you saved the model's layers as well, you do not have to redefine the layers.)
Then, we define a MyRunner which inherits from vessl.RunnerBase, which provides instructions for how to serve our model. You can read more about each method here.
Finally, we register the model using vessl.register_model. We specify the repository name and number, pass MyRunner as the runner class we will use for serving, and list any requirements to install.
After executing the script, you should see that two files have been generated: vessl.manifest.yaml, which stores metadata and vessl.runner.pkl, which stores the runner binary. Your model has been registered and is ready for serving.

2. Register a model from scratch

In some cases, you will want to train the model and register it within one script. You can use vessl.register_model to register a new model as well:
1
# Your training code
2
# model.fit()
3
​
4
vessl.configure()
5
vessl.register_model(
6
repository_name="my-repository",
7
model_number=None,
8
runner_cls=MyRunner,
9
model_instance=model,
10
requirements=["tensorflow"],
11
)
Copied!
After executing the script, you should see that three files have been generated: vessl.manifest.yaml, which stores metadata, vessl.runner.pkl, which stores the runner binary, and vessl.model.pkl, which stores the trained model. Your model has been registered and is ready for serving.

PyTorch models

If you are using PyTorch, there is an easier way to register your model. You only need to optionally define preprocess_data and postprocess_data - the other methods are autogenerated.
1
# Your training code
2
# for epoch in range(epochs):
3
# train(model, epoch)
4
​
5
vessl.configure()
6
vessl.register_torch_model(
7
repository_name="my-model",
8
model_number=1,
9
model_instance=model,
10
requirements=["torch"],
11
)
Copied!
Check out the documentation vessl.register_model and vessl.register_torch_model.

Deploy a registered model

Go to the URL of the registered model. In the SERVING tab, you will now be able to deploy your model. Click deploy, select a cluster, resource, and image.
If you used Python 3.8 or above to register your model, you must choose an image with Python 3.8 or above here. Otherwise if you used Python 3.7 or below, you must choose an image with Python 3.7 or below here.

Make predictions and monitor your model

Once you've clicked deploy, you will see screen like the one above. The service status indicates the status of the server instance. If this value shows failed, there was a problem serving your model. (Most likely, it will be because the Python version of the image did not match the version you used to register your model.) You can also view the logs and system metrics of the pod, as well as connect to the pod using SSH.
Make predictions using your service by sending HTTP requests to the service endpoint. As in the example request, use the POST method and pass your authentication token as a header. Pass your input data in the format you've specified in your runner when you registered the model. You should receive a response with the prediction.
1
curl -X POST -H "X-AUTH-KEY:[YOUR-AUTHENTICATION-TOKEN]" -d [YOUR-DATA] https://service-zl067zvrmf69-service-8000.uw2-dev-cluster.savvihub.com
Copied!