> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vessl.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Launch batch jobs on GPUs

> Leverge the power of GPUs to efficiently train batch runs

## Batch Run

Batch runs are designed to execute a series of commands defined in your YAML configuration and then terminate. Batch job is suitable for large-scale, long-running tasks. These tasks are powered by the robustness of GPU capabilities, which significantly hasten model training times.

### A Simple Batch Run

Here is an example of a simple batch run YAML configuration. It specifies Docker image to be used, the resource required for the run, and the commands to be exectued during the run.

```yaml Simple batch run definition theme={null}
name: gpu-batch-run
description: Run a GPU-backed batch run.
image: quay.io/vessl-ai/torch:2.3.1-cuda12.1-r5
resources:
  cluster: vessl-gcp-oregon
  preset: gpu-l4-small
run:
  - command: |
      nvidia-smi
```

In this example, the `resources.preset=v1.v100-1.mem-52` will request a V100 GPU instance. Next, the `nvidia-smi` command will be executed to display the
NVIDIA system management inteface and then terminate the run.

### Termination Protection

You can also define termination protection in a batch run. Termination protection keeps your run active for a specified duration even after your commands have finished executing. This can be usefrul for debugging or retrieving intermediate files.

```yaml Enable termination protection theme={null}
name: gpu-batch-run
description: Run a GPU-backed batch run.
image: quay.io/vessl-ai/torch:2.3.1-cuda12.1-r5
resources:
  cluster: vessl-gcp-oregon
  preset: gpu-l4-small
run:
  - command: |
      nvidia-smi
termination_protect: true
```

In this example, the `termination_protect` will protect the container termination after running `nvidia-smi` command.

## Train a Thin-Plate Spline Motion Model with GPU resource

Now let's dive in more complex batch run configuration. This configuration file describes a batch run for training a Thin-Plate Spline Motion Model utilizing a V100 GPU.

```yaml Batch run YAML for training Thin-Plate Spline Motion Model theme={null}
name: Thin-Plate-Spline-Motion-Model
description: "Animate your own image in the desired way with a batch run on VESSL."
image: nvcr.io/nvidia/pytorch:21.05-py3
resources:
  cluster: vessl-gcp-oregon
  preset: gpu-l4-small
run:
  - workdir: /root/examples/deprecated/thin-plate-spline-motion-model
    command: |
      pip install -r requirements.txt
      python run.py --config config/vox-256.yaml --device_ids 0
import:
  /root/examples: git://github.com/vessl-ai/examples
  /root/examples/vox: s3://vessl-public-apne2/vessl_run_datasets/vox/
```

In this batch run, the Docker image `nvcr.io/nvidia/pytorch:21.05-py3` is used, and a V100 GPU (`resources.preset=v1.v100-1.mem-52`) is allocated for the run. This will ensure that the training job runs on top of the V100 GPU.

The model and scripts used in this run are fetched from a Github repository (`/root/examples: git://github.com/vessl-ai/examples`).

The commands executed in the run first install the requriements, and train the model using the `run.py` script.

This example demonstrates how you can set up a batch run for GPU-backed training a machine learning model with a single YAML configuration.

## What's Next

For more advanced configurations and examples. please visit [VESSL Hub](https://vesslai.notion.site/9e42f785bbdf42379b2112b859d8c873?v=8d1527bc18154381b9baf35d4068b227\&pvs=4).

<Card title="VESSL Hub" icon="database" href="https://vesslai.notion.site/9e42f785bbdf42379b2112b859d8c873?v=8d1527bc18154381b9baf35d4068b227&pvs=4">
  A variatey of YAML examples that you can use as references
</Card>