Cluster
A cluster is a set of nodes connected to VESSL that runs containerized machine learning workloads. You can either use VESSL’s managed clusters, or bring your own on-premise or cloud cluster. VESSL provides number of useful cluster management features for the clusters plugged in:
- Running VESSL-managed, fault-tolerant ML workloads on the clusters
- Monitoring & tracking system metrics for cluster, nodes and workloads
- Watching cluster health and issue notifications, e.g. network failure or disk pressure
- Managing resource spec and quota for resource governance across multiple teams & users
.png?alt=media)
Last modified 10mo ago