May 12, 2024

Serverless deployment

Our serverless deployment infrastructure is the easiest way to scale inference workloads on remote GPUs. With continuous batching, effortless autoscaling, fast cold start, full observability, and more, your APIs are production-ready for full-spectrum AI & LLM applications. Refer to our docs to put custom Llama 3 in action Text Generation Inference(TGI), in 3 simple steps.

  1. Create a remote GPU-accelerated container
  2. Create an endpoint with Llama 3 from Hugging Face
  3. Send an HTTPS request to the deployed service

April 12, 2024

Announcing VESSL Serve

VESSL Service is the easiest way to Deploy custom models & generative AI applications and scale inference with ease. Deploy any models, to any clouds, at any scale in minutes without wasting hours on API servers, load balancing, automatic scaling, and more Read our release post or try our the Llama 3 example to learn more.

March 11, 2024

Cloud storage support for VESSL Run

Import your data from and export results to a cloud storage like AWS S3 and GCP GCS for your run. You can also bring your own cloud storage by adding the credential of your cloud storage on our improved Secrets page. Refer to our docs for a step-by-step guide.

Google Cloud Storage FUSE

We are brining FUSE support for GCS. FUSE helps you work with object storage through familiar filesystem operations without needing to directly use the proprietary GCS SDKs.

January 31, 2024

New get started guide

We’ve updated our documenation with a new get started guide. The new guide covers everything from product overview to the latest use cases of our product in Gen AI & LLM.

Follow along our new guide here.

New & Improved

  • Added a new managed cloud option built on Google Cloud
  • Renamed our default managed Docker images to torch:2.1.0-cuda12.2-r3

December 28, 2023

Announcing VESSL Hub

VESSL Hub is a collection of one-click recipes for the latest open-source models like Llama2, Mistral 7B, and StableDiffusion. Built on our fullstack AI infrastructure, Hub provides the easiest way to explore and deploy models.

Fine-tune and deploy the latest models on our production-grade fullstack cloud infrastructure with just a single click. Read about the release on our blog or try it out now at vessl.ai/hub.