Try out this model on VESSL Hub.

This example runs an app for inference using Mistral-7B which is an open-source LLM developed by Mistral AI. The model utilizes a grouped query attention (GQA) and a sliding window attention mechanism (SWA), which enable faster inference and handling longer sequences at smaller cost than other models. As a result, it achieves both efficiency and high performance. Mistral-7B outperforms Llama 2 13B on all benchmarks and Llama 1 34B in reasoning, mathematics, and code generation benchmarks.

Running the model

You can run the model with our quick command.

vessl run create -f mistral_7b.yaml

Here’s a rundown of the mistral_7b.yaml file.

name: mistral-7b-streamlit
description: A template Run for inference of Mistral-7B with streamlit app
resources:
  cluster: vessl-gcp-oregon
  preset: v1.l4-1.mem-42
image: quay.io/vessl-ai/hub:torch2.1.0-cuda12.2-202312070053
import:
  /model/: hf://huggingface.co/VESSL/Mistral-7B
  /code/:
    git:
      url: https://github.com/vessl-ai/hub-model
      ref: main
run:
  - command: |-
      pip install -r requirements_streamlit.txt
      streamlit run streamlit_demo.py --server.port 80
    workdir: /code/mistral-7B
interactive:
  max_runtime: 24h
  jupyter:
    idle_timeout: 120m
ports:
  - name: streamlit
    type: http
    port: 80