Skip to main content
Volumes provide powerful data management capabilities for your workspace, allowing you to access datasets, code repositories, and storage systems directly within your development environment. This guide covers how to configure and use volumes effectively in your workspaces.

Volume types

Import volumes

Import volumes download data from external sources into your workspace container at startup. The data is copied into the specified directory and becomes available immediately when your workspace starts. Common use cases:
  • Loading datasets for analysis or model training
  • Pulling the latest code from a Git repository
  • Downloading pre-trained models from model registries
  • Accessing files from cloud storage for processing
Supported sources:
  • Git repositories (GitHub, GitLab, BitBucket)
  • VESSL Dataset and Model Registry
  • Hugging Face datasets and models
  • VESSL Storage volumes
  • AWS S3 and Google Cloud Storage
  • External storage systems

Mount volumes

Mount volumes provide persistent, real-time access to external storage systems. Unlike import volumes, mounted volumes reflect changes made to the source in real-time and don’t consume additional disk space in your workspace. Common use cases:
  • Working with large datasets that exceed workspace disk limits
  • Sharing data across multiple workspaces or team members
  • Accessing frequently updated data sources
  • Integrating with existing data pipelines
Supported sources:
  • VESSL Storage volumes
  • AWS S3 (through S3 FUSE)
  • Google Cloud Storage (through GCS FUSE)
  • Network File System (NFS) for custom clusters
  • Host path storage for on-premises setups

Configuring volumes

Through the Web Console

When creating a new workspace, you can configure volumes in the workspace creation form:
  1. Navigate to the Volumes section during workspace creation
  2. Click Add Volume to configure a new volume
  3. Select the volume type (Import or Mount)
  4. Choose the source (Dataset, Storage, Git repository, etc.)
  5. Specify the target path where the volume should be accessible in your workspace

Using VESSL CLI

You can also create workspaces with volumes using the VESSL CLI:
# Create workspace with imported dataset
vessl workspace create \
  --name "my-workspace" \
  --cluster "my-cluster" \
  --import "/data:vessl-dataset://my-org/my-dataset"

# Create workspace with mounted storage
vessl workspace create \
  --name "my-workspace" \
  --cluster "my-cluster" \
  --mount "/shared:volume://vessl-storage/shared-data"

Best practices

Choosing between import and mount

Use import volumes when:
  • You need a snapshot of data at a specific point in time
  • Working with relatively small datasets (< 10GB)
  • You want to ensure data consistency throughout your workspace session
  • Network connectivity to the source might be intermittent
Use mount volumes when:
  • Working with large datasets that exceed workspace disk capacity
  • You need real-time access to frequently updated data
  • Sharing data across multiple workspaces or users
  • Integrating with external data pipelines that update source data

Volume paths and organization

  • Use descriptive paths: Choose clear, descriptive mount points like /data/datasets or /code/project
  • Avoid system directories: Don’t mount volumes to system directories like /bin, /usr, or /etc
  • Leverage /root persistence: Remember that /root is automatically persistent, so you can store temporary files and configurations there
  • Organize by purpose: Group related volumes together (e.g., /data/ for datasets, /models/ for pre-trained models)

Performance considerations

  • Mount for large data: Use mount volumes for datasets larger than your workspace disk allocation
  • Import for speed: Import volumes provide faster access since data is local to the workspace
  • Network location matters: Choose storage locations close to your compute cluster for optimal performance

Common workflows

Data science workflow

# Import code repository
/code: git://github.com/myorg/ml-project

# Mount large dataset
/data/raw: volume://vessl-storage/raw-dataset

# Import pre-trained model
/models/pretrained: vessl-model://myorg/bert-base/v1

# Use /root for experiments and outputs (automatically persistent)

Collaborative development

# Shared codebase
/project: git://github.com/team/shared-project

# Shared datasets
/datasets: volume://vessl-storage/team-datasets

# Personal workspace for experiments
# (Use /root for personal files and configurations)

Model development and training

# Training data
/data/train: vessl-dataset://myorg/training-data

# Validation data  
/data/val: vessl-dataset://myorg/validation-data

# Model checkpoints (shared storage for team access)
/checkpoints: volume://vessl-storage/model-checkpoints

Troubleshooting

Volume mount failures

If a volume fails to mount:
  1. Check permissions: Ensure your organization has access to the specified storage
  2. Verify paths: Confirm the source path exists and is accessible
  3. Review credentials: For external storage, verify integration credentials are valid
  4. Check cluster connectivity: Ensure your cluster can reach the external storage system

Performance issues

If you experience slow data access:
  1. Use appropriate volume type: Consider mount vs. import based on your use case
  2. Check network connectivity: Ensure good connectivity between cluster and storage
  3. Optimize data location: Use storage systems geographically close to your cluster
  4. Monitor resource usage: Check if workspace resources are sufficient for your workload

Storage limitations

Remember these important limitations:
  • Disk space: Import volumes consume workspace disk space
  • Persistence: Only /root directory persists across workspace restarts
  • Custom clusters: Some volume types may have limitations on custom clusters
  • Network requirements: External storage requires appropriate network access

Need help with storage setup?

Learn more about VESSL’s storage system and how to configure different storage types.