Try out this model on VESSL Hub.

This example fine-tunes Llama 2 on a code instruction dataset. The code instruction dataset is consisted of 1.6K samples and follows the format of Stanford’s Alpaca dataset. To optimize the training process into a single GPU with moderate memory, the model uses 8 bit quantization and LoRA (Low-Rank Adaptation).

In the code we are referencing under /code/, we added our Python SDK for logging key metrics like loss and learning rate. You can check these values in real-time under Plots. The run completes by uploading the model checkpoint to VESSL AI model registry, as defined under export.

Running the model

You can run the model with our quick command.

vessl run create -f llama2_fine-tuning.yaml

Here’s a rundown of the llama2_fine-tuning.yaml file.

name: llama2-finetuning
description: finetune llama2 with code instruction alpaca dataset
resources:
  cluster: vessl-gcp-oregon
  preset: v1.l4-1.mem-27
image: quay.io/vessl-ai/hub:torch2.1.0-cuda12.2-202312070053
import:
  /model/: vessl-model://vessl-ai/llama2/1
  /code/:
    git:
      url: https://github.com/vessl-ai/hub-model
      ref: main
  /dataset/: vessl-dataset://vessl-ai/code_instructions_small_alpaca
export:
  /trained_model/: vessl-model://vessl-ai/llama2-finetuned
  /artifacts/: vessl-artifact://
run:
  - command: |-
      pip install -r requirements.txt
      mkdir /model_
      cd /model
      mv llama_2_7b_hf.zip /model_
      cd /model_
      unzip llama_2_7b_hf.zip
      cd /code/llama2-finetuning
      python finetuning.py
    workdir: /code/llama2-finetuning