You can use VESSL to quickly deploy your models into production for use from external applications via APIs. You can register it via the SDK and deploy it in the Web UI in one click.

Register a model using the SDK

A model file cannot be deployed on its own - we need to provide instructions on how to setup the server, handle requests, and send responses. This step is called registering a model.

There are two ways you can register a model. One is to use an existing model - that is, a VESSL model exists and a model file is stored in it. The other is to train a model from scratch and register it. The two options are further explained below.

1. Register an existing model

In most cases, you will have already trained model and have the file ready, either through VESSL’s experiment or in your local environment. After creating a model, you will need to register it using the SDK. The below example shows how you can do so.

import torch
import torch.nn as nn
from io import BytesIO
from pydantic import BaseModel

import vessl

# Define model
class Net(nn.Module):
    # Define model

# (Optional) Define Input/Output data type
class InputType(BaseModel):
    data: bytes

class OutputType(BaseModel):
    result: torch.Tensor

class MyRunner(vessl.RunnerBase):
    @staticmethod
    def load_model(props, artifacts):
        model = Net()
        model.load_state_dict(torch.load("model.pt"))
        model.eval()
        return model

    @staticmethod
    def preprocess_data(data: InputType):
        return torch.load(BytesIO(data))

    @staticmethod
    def predict(model, data):
        with torch.no_grad():
            return model(data).argmax(dim=1, keepdim=False)

    @staticmethod
    def postprocess_data(data: OutputType):
        return {"result": data.item()}

vessl.configure()
vessl.register_model(
    repository_name="my-repository",
    model_number=1,
    runner_cls=MyRunner,
    requirements=["torch"],
)

First, we redefine the layers of the torch model. (This is assuming we only saved the state_dict, or the model’s parameters. If you saved the model’s layers as well, you do not have to redefine the layers.)

Then, we define a MyRunner which inherits from vessl.RunnerBase, which provides instructions for how to serve our model. You can read more about each method here.

Finally, we register the model using vessl.register_model. We specify the repository name and number, pass MyRunner as the runner class we will use for service, and list any requirements to install.

After executing the script, you should see that two files have been generated: vessl.manifest.yaml, which stores metadata and vessl.runner.pkl, which stores the runner binary. Your model has been registered and is ready for service.

2. Register a model from scratch

In some cases, you will want to train the model and register it within one script. You can use vessl.register_model to register a new model as well:

# Your training code
# model.fit()

vessl.configure()
vessl.register_model(
    repository_name="my-repository",
    model_number=None,
    runner_cls=MyRunner,
    model_instance=model,
    requirements=["tensorflow"],
)

After executing the script, you should see that three files have been generated: vessl.manifest.yaml, which stores metadata, vessl.runner.pkl, which stores the runner binary, and vessl.model.pkl, which stores the trained model. Your model has been registered and is ready for service.

PyTorch models

If you are using PyTorch, there is an easier way to register your model. You only need to optionally define preprocess_data and postprocess_data - the other methods are autogenerated.

# Your training code
# for epoch in range(epochs):
#     train(model, epoch)

vessl.configure()
vessl.register_torch_model(
    repository_name="my-model",
    model_number=1,
    model_instance=model,
    requirements=["torch"],
)

Deploy a registered model

You can deploy your model with VESSL Service

Once you deployed your model with VESSL Service, you can make predictions using your service by sending HTTP requests to the service endpoint. As in the example request, use the POST method and pass your authentication token as a header. Pass your input data in the format you’ve specified in your runner when you registered the model. You should receive a response with the prediction.

curl -X POST -H "X-AUTH-KEY:[YOUR-AUTHENTICATION-TOKEN]" -d [YOUR-DATA] https://service-zl067zvrmf69-service-8000.uw2-dev-cluster.savvihub.com