Deploying a Model (Model Serving)
You can use VESSL to quickly deploy your models into production for use from external applications via APIs. You can register it via the SDK and deploy it in the Web UI in one click.
A model file cannot be deployed on its own - we need to provide instructions on how to setup the server, handle requests, and send responses. This step is called registering a model.
There are two ways you can register a model. One is to use an existing model - that is, a VESSL model exists and a model file is stored in it. The other is to train a model from scratch and register it. The two options are further explained below.
In most cases, you will have already trained model and have the file ready, either through VESSL's experiment or in your local environment. After creating a model, you will need to register it using the SDK. The below example shows how you can do so.
import torch
import torch.nn as nn
from io import BytesIO
import vessl
class Net(nn.Module):
# Define model
class MyRunner(vessl.RunnerBase):
@staticmethod
def load_model(props, artifacts):
model = Net()
model.load_state_dict(torch.load("model.pt"))
model.eval()
return model
@staticmethod
def preprocess_data(data):
return torch.load(BytesIO(data))
@staticmethod
def predict(model, data):
with torch.no_grad():
return model(data).argmax(dim=1, keepdim=False)
@staticmethod
def postprocess_data(data):
return {"result": data.item()}
vessl.configure()
vessl.register_model(
repository_name="my-repository",
model_number=1,
runner_cls=MyRunner,
requirements=["torch"],
)
First, we redefine the layers of the torch model. (This is assuming we only saved the
state_dict
, or the model's parameters. If you saved the model's layers as well, you do not have to redefine the layers.)Then, we define a
MyRunner
which inherits from vessl.RunnerBase
, which provides instructions for how to serve our model. You can read more about each method here.Finally, we register the model using
vessl.register_model
. We specify the repository name and number, pass MyRunner
as the runner class we will use for serving, and list any requirements to install.After executing the script, you should see that two files have been generated:
vessl.manifest.yaml
, which stores metadata and vessl.runner.pkl
, which stores the runner binary. Your model has been registered and is ready for serving.In some cases, you will want to train the model and register it within one script. You can use
vessl.register_model
to register a new model as well:# Your training code
# model.fit()
vessl.configure()
vessl.register_model(
repository_name="my-repository",
model_number=None,
runner_cls=MyRunner,
model_instance=model,
requirements=["tensorflow"],
)
After executing the script, you should see that three files have been generated:
vessl.manifest.yaml
, which stores metadata, vessl.runner.pkl
, which stores the runner binary, and vessl.model.pkl
, which stores the trained model. Your model has been registered and is ready for serving.If you are using PyTorch, there is an easier way to register your model. You only need to optionally define
preprocess_data
and postprocess_data
- the other methods are autogenerated.# Your training code
# for epoch in range(epochs):
# train(model, epoch)
vessl.configure()
vessl.register_torch_model(
repository_name="my-model",
model_number=1,
model_instance=model,
requirements=["torch"],
)

Go to the URL of the registered model. In the SERVING tab, you will now be able to deploy your model. Click deploy, select a cluster, resource, and image.
If you used Python 3.8 or above to register your model, you must choose an image with Python 3.8 or above here. Otherwise if you used Python 3.7 or below, you must choose an image with Python 3.7 or below here.

Once you've clicked deploy, you will see screen like the one above. The service status indicates the status of the server instance. If this value shows
failed
, there was a problem serving your model. (Most likely, it will be because the Python version of the image did not match the version you used to register your model.) You can also view the logs and system metrics of the pod, as well as connect to the pod using SSH.Make predictions using your service by sending HTTP requests to the service endpoint. As in the example request, use the POST method and pass your authentication token as a header. Pass your input data in the format you've specified in your runner when you registered the model. You should receive a response with the prediction.
curl -X POST -H "X-AUTH-KEY:[YOUR-AUTHENTICATION-TOKEN]" -d [YOUR-DATA] https://service-zl067zvrmf69-service-8000.uw2-dev-cluster.savvihub.com
Last modified 8mo ago