Get the right configuration for you with the NVIDIA L40S GPU
With Vultr Cloud GPU, accelerated by NVIDIA’s computing platform, the NVIDIA L40S GPU can be harnessed through Vultr Cloud GPU or as an 8-GPU bare metal server. Get up to speed quickly powered by Vultr GPU Enabled Images, or enjoy the flexibility of direct access to NVIDIA L40S GPUs through cloud GPU or bare metal. Experience greater control and the ability to supply your own drivers for maximum software compatibility.

NVIDIA L40S
Starting at
$1.67 / Per hour
Boost multimodal GenAI with NVIDIA L40S GPUs and Ada Lovelace architecture
Pricing

Starting at $1.671 / hour

Key features

The NVIDIA L40S GPU, accelerated by the Ada Lovelace architecture, is the most powerful universal GPU for the data center.

The NVIDIA L40S GPU delivers up to 1.7x training and 1.5x inference performance versus the previous generation NVIDIA A100 Tensor Core GPU.

With breakthrough performance and 48 gigabytes (GB) of memory capacity, the NVIDIA L40S GPU is the ideal platform for accelerating multimodal GenAI workloads.

Accelerated by the NVIDIA
Ada Lovelace architecture

Fourth-generation Tensor Cores

Hardware support for structural sparsity and optimized TF32 format provides out of-the-box performance gains for faster AI and data science model training. Accelerate AI-enhanced graphics capabilities with DLSS to upscale resolution with better performance in select applications.

Third-generation RT Cores

Enhanced throughput and concurrent ray-tracing and shading capabilities improve ray-tracing performance, accelerating renders for product design and architecture, engineering, and construction workflows. See lifelike designs in action with hardware-accelerated motion blur and stunning real-time animations.

CUDA Cores

Accelerated single-precision floating point (FP32) throughput and improved power efficiency significantly boost performance for workflows like 3D model development and computer-aided engineering (CAE) simulation. Use enhanced 16-bit math capabilities (BF16) for mixed-precision workloads.

Transformer Engine

Transformer Engine dramatically accelerates AI performance and improves memory utilization for both training and inference. Harnessing the power of the Ada Lovelace fourth-generation Tensor Cores, Transformer Engine intelligently scans the layers of transformer architecture neural networks and automatically recasts between FP8 and FP16 precisions to deliver faster AI performance and accelerate training and inference.

Efficiency and security

L40S GPU is optimized for 24/7 enterprise data center operations and designed, built, tested, and supported by NVIDIA to ensure maximum performance, durability, and uptime. The L40S GPU meets the latest data center standards, are Network Equipment-Building System (NEBS) Level 3 ready, and features secure boot with root of trust technology, providing an additional layer of security for data centers.

DLSS 3

L40S GPU enables ultra-fast rendering and smoother frame rates with NVIDIA DLSS 3. This breakthrough frame-generation technology leverages deep learning and the latest hardware innovations within the Ada Lovelace architecture and the L40S GPU, including fourth-generation Tensor Cores and an Optical Flow Accelerator, to boost rendering performance, deliver higher frames per second (FPS), and significantly improve latency.

Unparalleled AI and graphics performance for the data center

The NVIDIA L40S GPU, based on the Ada Lovelace architecture, is the most powerful universal GPU for the data center, delivering breakthrough multi-workload acceleration for large language model (LLM) inference and training, graphics, and video applications.

No information is required for download

Universal
Performance

NVIDIA L40S

  • Tensor performance
  • 1,466 TFLOPS1
  • RT Core performance
  • 1.6x TFLOPS
  • Single-precision performance
  • 110x TFLOPS
1Peak rates are based on GPU boost clock
Accelerate business growth with faster, smarter data processing

Is your business struggling to meet the data demands of today's fast-paced digital environment? It's time for a change. Vultr and SQream offer a modern, scalable solution to help future-proof your business.

Specifications
FP32 91.6 teraFLOPS
TF32 Tensor Core 366 teraFLOPS*
FP16 733 teraFLOPS*
FP8 1,466 teraFLOPS*
RT Core Performance 212 teraFLOPS*
Max Power Consumption 350 W
*With sparsity

Additional resources

FAQ

What are the key features of the NVIDIA L40S?

  • Advanced AI & ML Acceleration – Optimized for AI training and inference.
  • High Memory Capacity – Large VRAM for handling complex models.
  • Superior Rendering Performance – Ideal for video encoding, 3D modeling, and simulations.
  • Data Center-Ready – Built for enterprise cloud and HPC workloads.

Who should use NVIDIA L40S?

The NVIDIA L40S is ideal for:

  • AI researchers & data scientists – For deep learning and model training.
  • Developers & engineers – Running high-performance applications.
  • Media & entertainment professionals – For 3D rendering and video processing.
  • HPC users – Computational simulations and scientific applications.

How can I deploy an NVIDIA L40S GPU on Vultr?

You can instantly deploy an NVIDIA L40S GPU via the control panel or API.

What are the benefits of renting NVIDIA L40S in the cloud?

  • On-demand access – No need to invest in expensive hardware.
  • Scalability – Increase resources as needed for workloads.
  • Cost efficiency – Pay for what you use without long-term commitments.
  • Global availability – Deploy from multiple regions for low-latency performance.

Can I use NVIDIA L40S for AI model training?

Yes. The L40S GPU is optimized for AI and deep learning, delivering high-speed processing for training and inferencing large neural networks.

Can I run multiple workloads on an NVIDIA L40S GPU?

Yes. The multi-instance GPU (MIG) capability allows you to split GPU resources among different workloads, maximizing efficiency.

How does the NVIDIA L40S perform in multi-modal AI workloads compared to previous-generation GPUs?

The NVIDIA L40S significantly outperforms previous-generation GPUs in multi-modal AI workloads due to its combination of fourth-gen Tensor Cores, the Transformer Engine, and large memory capacity. It accelerates training and inference for models combining text, image, and video data – making it ideal for generative AI, large language models (LLMs), and computer vision tasks.

What are the advantages of the NVIDIA L40S for real-time 3D rendering and ray tracing?

The L40S integrates third-gen RT Cores and DLSS 3 technology, enabling real-time 3D rendering and photorealistic ray tracing with minimal latency. It's especially beneficial in digital twin applications, virtual production, and architectural visualization, where precision and speed are critical.

How does the NVIDIA L40S architecture optimize GPU resource utilization in virtualized or multi-tenant environments?

With support for Multi-Instance GPU (MIG) and SR-IOV, the L40S allows the slicing of the GPU into isolated instances. This improves utilization and enables workload isolation and is ideal for cloud service providers and enterprises managing multiple AI, rendering, or HPC tasks in parallel on a single GPU.

What are the key features of the NVIDIA L40S?

  • Advanced AI & ML Acceleration – Optimized for AI training and inference.
  • High Memory Capacity – Large VRAM for handling complex models.
  • Superior Rendering Performance – Ideal for video encoding, 3D modeling, and simulations.
  • Data Center-Ready – Built for enterprise cloud and HPC workloads.

Get started,
or get some advice

Start your GPU-accelerated project now by signing up for a free Vultr account.
Or, if you’d like to speak with us regarding your needs, please reach out.