Serve & deploy with Text Generation Inference (TGI)
(GPU)gpu-l4-small
. This means that we will be using one NVIDIA L4 GPU and a 42GB RAM instance.ghcr.io/huggingface/text-generation-inference:2.0.2
.HTTP
, 8000
and name it default
.MODEL_ID
casperhansen/llama-3-8b-instruct-awq