Vultr Open Cluster Manager | Vultr Marketplace One-Click Application - Vultr.com
Vultr Open Cluster Manager logo
Vultr Open Cluster Manager
Preview of the Vultr management interface for Vultr Open Cluster Manager on a mobile device.
Vultr Open Cluster Manager logo|trans
Vultr Open Cluster Manager
Preview of the Vultr server deploy page control panel for Vultr Open Cluster Manager on a web browser.

Launch a scaled AI workload with easy on any number of Vultr GPU's. Vultr Open Cluster Manager is pre-built with Open Source tools such as Terraform, Ansible, Grafana, and SLURM to help you deploy Vultr GPU instances that can run your workload immediately.

Unless you are deploying this application to an instance with a reserved IP that has already been added to the list of IP addresses allowed for API access, enable Vultr API access from 0.0.0.0/0 before deploying this application!

Vultr Open Cluster Manager

Your cluster manager is ready!

  • Your server's IP address is: use.your.ip.
  • The root password is: use.your.root.password.

Usage:

  • SSH to your new cluster manager.
  • Inspect and edit /root/config.yml to your specifications. Some things to note below:

    • instance_plan : This is the Vultr SKU that you will be deploying for your cluster nodes.
    • instance_gpu : If you will be using Slurm, you must specify the GPU model from the plan chosen to be used in the Slurm configuration.
    • instance_gpu_count : If you will be using Slurm, you must specify the GPU count from the plan chosen to be used in the Slurm configuration.
    • instance_slurm_memory : Slurm needs to know how much of the cluster node's RAM can be used. Generally set this to 15% less than the total available.
    • os_id : ID of the Operating system to be installed on cluster nodes. Query to https://api.vultr.com/v2/os can get available OS provided by Vultr. Default is Ubuntu 22.04 LTS
    • instance_region : Autofilled with the region of the cluster manager instance. If you change this, the automatically created and attached VPC will be invalid.
    • hostprefix : Prefix of each cluster node's hostname. Defaults to #region#-cluster-node
    • hostsuffix: Suffix of each cluster node's hostname. Defaults to gpu.local
  • You may wish to have the rest completed automatically in which case you can run /root/build-cluster.sh

  • Change into the Terraform directory. cd /root/terraform
  • Initialize Terraform. terraform init
  • Check the Terraform plan. terraform plan
  • Apply the Terraform plan. terraform apply
  • Wait for cluster nodes to be built and come online before proceeding.
  • Change into the Ansible directory. cd /root/ansible
  • Run the Ansible playbook. ansible-playbook -i hosts cluster.yml This will perform the following actions:
    • Update package repo on all cluster nodes and the manager.
    • Install Grafana Alloy on all cluster nodes and the manager.
    • Configure Grafana Alloy to send logs to the provided Loki instance (if provided).
    • Install and configure the Slurm Daemon slurmd on all cluster nodes.
    • Install and configure the Slurm Controller slurmctld on the manager.
    • Brings online a Grafana and Prometheus docker container on the manager. See /root/docker-compose.yml.
    • Install Prometheus Node Exporter on all cluster nodes.
    • Adds Node Exporter dashboard to local Grafana instance.
  • Connect to your Grafana interface at http://use.your.ip:3000/.
    • Your Grafana username is: admin.
    • Your Grafana password is: Grafana Password.

Support Information

Support Contact

Website
https://vultr.com
Email
support@vultr.com
Support URL
https://my.vultr.com
Repository
https://www.vultr.com

Maintainer Contact

Report Application

Report an application with malicious intent or harmful content.

Thank you for your report!

Our Team has received your report and will respond accordingly as possible.