Introducing Vultr Talon: Affordable Cloud VMs Accelerated with NVIDIA GPUs
May 24, 2022

Here at Vultr, we pride ourselves on making cloud infrastructure affordable for everyone – developers, SMBs, and large enterprises. That desire to keep prices down has led us to create Vultr Talon, a breakthrough platform which we are introducing today in beta. With Vultr Talon, we are now offering best-in-class, virtualized GPUs, starting with the NVIDIA A100 Tensor Core GPU, at a fraction of the price of a full GPU. Vultr is the first cloud provider to offer virtualization of NVIDIA A100 GPUs to enable GPU sharing, ensuring optimal resource utilization.

Today’s launch represents a category-creating innovation, delivering a whole new kind of cloud VM that addresses some of the biggest hurdles to building and deploying production AI-enabled applications using GPUs.

Boosting cost-effectiveness and utilization of GPU instances

NVIDIA GPUs have exploded in popularity over the past several years – and for good reason. A single GPU packs thousands of specialized cores perfectly suited for parallel computation. In addition to their massive computational power, NVIDIA GPUs are incredibly versatile accelerators that can be used for AI, machine learning, data analytics, scientific computing, and so much more.

However, there’s no one size fits all when it comes to customer workloads. Diverse workloads have varying compute requirements ranging from a fraction of a GPU to multiple GPUs on a single node or across multiple nodes.

Provisioning the right sized acceleration for your workload and maximizing utilization is critical for cloud cost optimization.

Historically, cloud-based users have only been able to purchase entire physical GPUs running in passthrough mode, attached to cloud compute instances. High-end GPUs delivered in this way typically cost thousands of dollars per month.

This cost is often justifiable for the largest enterprise workloads, some of which are so compute-intensive that they require multiple GPUs running in parallel. But, for many businesses and developers, the cost of even a single GPU can be prohibitive to getting started, experimenting, or for running applications in development and testing environments. Even enterprises with substantial IT budgets may end up wasting significant amounts of money, provisioning more GPU capacity than they actually need, or simply deciding to avoid using GPUs at all.

GPU virtualization of NVIDIA A100 in the cloud, enabling efficient compute and reduced costs for AI

GPUs deliver massive potential impact, but many customers who could benefit from these capabilities aren’t able to take advantage of them. The fact is, meeting the computing demand to support various AI workloads cannot be a one-size-fits-all approach.

By leveraging our expertise in virtualization and cloud infrastructure, we sought to turn the GPU delivery model upside down. Rather than offering entire physical GPUs at less accessible prices, we sought to deliver GPU sharing enabled by virtualization for just a fraction of the cost. Working in close collaboration with NVIDIA, we developed the Vultr Talon platform powered by NVIDIA GPUs and NVIDIA AI Enterprise software, that we are unveiling with today’s beta.

World-changing NVIDIA GPU technology can now be accessed for prices starting at $90 per month, or just $0.13 per hour

At the core of Vultr Talon is a state-of-the-art NVIDIA GPU virtualization platform. Rather than attaching entire physical GPUs to VMs, we are instead attaching just a fraction in the form of a virtual GPU (vGPU). Virtual GPUs are powered by NVIDIA AI Enterprise, which includes the NVIDIA vGPU software and is optimized for remotely running AI workloads and high-performance data analytics.

To your machine, a vGPU looks just like a physical GPU. Each vGPU has its own dedicated memory that is a portion of the underlying card’s memory. The vGPU has access to a corresponding portion of the physical GPU’s computational power.

Vultr Talon uses NVIDIA’s Multi-Instance GPU technology for VMs with at least 10GB of GPU memory, which expands the performance value by providing guaranteed QoS, fully isolated GPU high-bandwidth memory, cache, and dedicated compute cores to tenants.

You can use Vultr VMs with virtual GPUs to run all the same frameworks, libraries, and operating systems you would run on a physical GPU. As with all Vultr products, you can easily scale your usage up or down to precisely match your GPU spend to your actual needs.

Fractions of an NVIDIA A100, starting at just $90 per month

The NVIDIA A100 Tensor Core GPU delivers incredible acceleration for deep learning, high performance computing (HPC), and data analytics. Powered by technology breakthroughs in the NVIDIA Ampere architecture like third-generation Tensor Cores, TensorFloat-32 (TF32) precision and structural sparsity, the NVIDIA A100 provides a unified workload accelerator for data analytics, AI training, AI inference and HPC.

Combined with the NVIDIA AI Enterprise software suite, which is optimized for the development and deployment of AI and certified to run in virtualized environments, the NVIDIA A100 accelerates all major deep learning and data analytics frameworks like TensorFlow and PyTorch and over 700 HPC applications.

Cloud instances with single or multiple physical GPUs typically sell for thousands of dollars per month. In contrast, we are today introducing a set of Vultr plans that give you a fraction of an NVIDIA A100 Tensor Core GPU along with dedicated vCPUs, RAM, storage, and bandwidth, starting at $90 per month, or $0.13 per hour. These instances enable provisioning GPU resources with greater granularity and provide developers the optimal amount of accelerated compute.

Right now, in our control panel, you can provision GPU instances with a wide range of specifications.

What can you do with a virtualized NVIDIA A100 Tensor Core GPU?

Vultr plans with the NVIDIA A100 GPU and NVIDIA AI Enterprise software lend themselves to a wide range of production and development use cases. In particular, we recommend them for ML inference and model building workloads for natural language processing, voice recognition and computer vision.

In addition to the NVIDIA AI Enterprise software suite, NVIDIA also offers the NGC catalog, a hub of additional GPU-optimized AI and HPC software. NGC includes enterprise-grade containers, frameworks, pretrained models, Helm charts, and industry-specific software development kits (SDKs) for data scientists, developers, and DevOps teams to build and deploy their solutions faster. With NGC, developers can deploy performance-optimized AI/HPC software containers, pretrained AI models, and Jupyter Notebooks that accelerate AI developments and HPC workloads on any GPU-powered on-prem, cloud, and edge systems.

For compute-intensive AI workloads such as data preparation and deep learning training, customers can choose to use a multi-GPU or full NVIDIA A100. Less compute-intensive workloads such as AI inference or edge AI often don’t require the full compute power of a GPU and can run on smaller vGPU-size offerings.

Bare Metal GPU for large workloads

Should you wish to run a workload that requires multiple physical GPUs, we also have Bare Metal servers with four NVIDIA A100 GPUs and dual 24-core Intel Xeon CPUs.

Just getting started

Today’s launch is a beta, with initial capacity in New Jersey. We will be adding global inventory for NVIDIA A100, A40, and A16 GPUs in the weeks ahead, to better support additional regions and a wider variety of use cases.

If you are interested in trying Vultr Talon, you can provision instances through our control panel. You’ll find additional guidance on how to get started in the documentation section of our website.

If you’d like to speak with us regarding your needs, we encourage you to contact our sales team.