Try out this model on VESSL Hub.

This example runs a general-purpose speech recognition model, Whisper V3. It is trained on a 680k hours of diverse labelled audio. Whisper is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. It can generalize to many domains without additional fine-tuning.

Running the model

You can run the model with our quick command.

vessl run create -f whisper.yaml

If you open log pages, you can see the result of inference for first 5 data in Librispeech_asr dataset.

Here’s a rundown of the whisper.yaml file.

name: whisper-v3
description: A template Run for inference of whisper v3 on librispeech_asr test set
resources:
  cluster: vessl-gcp-oregon
  preset: v1.l4-1.mem-42
image: quay.io/vessl-ai/hub:torch2.1.0-cuda12.2-202312070053
import:
  /model/: hf://huggingface.co/VESSL/Whisper-large-v3
  /dataset/: hf://huggingface.co/datasets/VESSL/librispeech_asr_clean_test
  /code/:
    git:
      url: https://github.com/vessl-ai/hub-model
      ref: main
run:
  - command: |-
      pip install -r requirements.txt
      python inference.py
    workdir: /code/whisper-v3