Enterprises are quickly adopting generative AI, giant language fashions (LLMs), superior graphics and digital twins to extend operational efficiencies, scale back prices and drive innovation.
Nonetheless, to undertake these applied sciences successfully, enterprises want entry to state-of-the-art, full-stack accelerated computing platforms. To satisfy this demand, Oracle Cloud Infrastructure (OCI) right now introduced NVIDIA L40S GPU bare-metal situations accessible to order and the upcoming availability of a brand new digital machine accelerated by a single NVIDIA H100 Tensor Core GPU. This new VM expands OCI’s present H100 portfolio, which incorporates an NVIDIA HGX H100 8-GPU bare-metal occasion.
Paired with NVIDIA networking and operating the NVIDIA software program stack, these platforms ship highly effective efficiency and effectivity, enabling enterprises to advance generative AI.
NVIDIA L40S Now Obtainable to Order on OCI
The NVIDIA L40S is a common knowledge middle GPU designed to ship breakthrough multi-workload acceleration for generative AI, graphics and video functions. Outfitted with fourth-generation Tensor Cores and help for the FP8 knowledge format, the L40S GPU excels in coaching and fine-tuning small- to mid-size LLMs and in inference throughout a variety of generative AI use instances.
For instance, a single L40S GPU (FP8) can generate as much as 1.4x extra tokens per second than a single NVIDIA A100 Tensor Core GPU (FP16) for Llama 3 8B with NVIDIA TensorRT-LLM at an enter and output sequence size of 128.
The L40S GPU additionally has best-in-class graphics and media acceleration. Its third-generation NVIDIA Ray Tracing Cores (RT Cores) and a number of encode/decode engines make it ultimate for superior visualization and digital twin functions.
The L40S GPU delivers as much as 3.8x the real-time ray-tracing efficiency of its predecessor, and helps NVIDIA DLSS 3 for sooner rendering and smoother body charges. This makes the GPU ultimate for creating functions on the NVIDIA Omniverse platform, enabling real-time, photorealistic 3D simulations and AI-enabled digital twins. With Omniverse on the L40S GPU, enterprises can develop superior 3D functions and workflows for industrial digitalization that may enable them to design, simulate and optimize merchandise, processes and amenities in actual time earlier than going into manufacturing.
OCI will supply the L40S GPU in its BM.GPU.L40S.4 bare-metal compute form, that includes 4 NVIDIA L40S GPUs, every with 48GB of GDDR6 reminiscence. This form consists of native NVMe drives with 7.38TB capability, 4th Technology Intel Xeon CPUs with 112 cores and 1TB of system reminiscence.
These shapes eradicate the overhead of any virtualization for high-throughput and latency-sensitive AI or machine studying workloads with OCI’s bare-metal compute structure. The accelerated compute form options the NVIDIA BlueField-3 DPU for improved server effectivity, offloading knowledge middle duties from CPUs to speed up networking, storage and safety workloads. The usage of BlueField-3 DPUs furthers OCI’s technique of off-box virtualization throughout its total fleet.
OCI Supercluster with NVIDIA L40S permits ultra-high efficiency with 800Gbps of internode bandwidth and low latency for as much as 3,840 GPUs. OCI’s cluster community makes use of NVIDIA ConnectX-7 NICs over RoCE v2 to help high-throughput and latency-sensitive workloads, together with AI coaching.
“We selected OCI AI infrastructure with bare-metal situations and NVIDIA L40S GPUs for 30% extra environment friendly video encoding,” stated Sharon Carmel, CEO of Beamr Cloud. “Movies processed with Beamr Cloud on OCI can have as much as 50% diminished storage and community bandwidth consumption, rushing up file transfers by 2x and growing productiveness for finish customers. Beamr will present OCI clients video AI workflows, making ready them for the way forward for video.”
Single-GPU H100 VMs Coming Quickly on OCI
The VM.GPU.H100.1 compute digital machine form, accelerated by a single NVIDIA H100 Tensor Core GPU, is coming quickly to OCI. It will present cost-effective, on-demand entry for enterprises trying to make use of the facility of NVIDIA H100 GPUs for his or her generative AI and HPC workloads.
A single H100 supplies platform for smaller workloads and LLM inference. For instance, one H100 GPU can generate greater than 27,000 tokens per second for Llama 3 8B (as much as 4x extra throughput than a single A100 GPU at FP16 precision) with NVIDIA TensorRT-LLM at an enter and output sequence size of 128 and FP8 precision.
The VM.GPU.H100.1 form consists of 2×3.4TB of NVMe drive capability, 13 cores of 4th Gen Intel Xeon processors and 246GB of system reminiscence, making it well-suited for a variety of AI duties.
“Oracle Cloud’s bare-metal compute with NVIDIA H100 and A100 GPUs, low-latency Supercluster and high-performance storage delivers as much as 20% higher price-performance for Altair’s computational fluid dynamics and structural mechanics solvers,” stated Yeshwant Mummaneni, chief engineer of information administration analytics at Altair. “We look ahead to leveraging these GPUs with digital machines for the Altair Limitless digital equipment.”
GH200 Naked-Steel Situations Obtainable for Validation
OCI has additionally made accessible the BM.GPU.GH200 compute form for buyer testing. It options the NVIDIA Grace Hopper Superchip and NVLink-C2C, a high-bandwidth, cache-coherent 900GB/s connection between the NVIDIA Grace CPU and NVIDIA Hopper GPU. This supplies over 600GB of accessible reminiscence, enabling as much as 10x larger efficiency for functions operating terabytes of information in comparison with the NVIDIA A100 GPU.
Optimized Software program for Enterprise AI
Enterprises have all kinds of NVIDIA GPUs to speed up their AI, HPC and knowledge analytics workloads on OCI. Nonetheless, maximizing the total potential of those GPU-accelerated compute situations requires an optimized software program layer.
NVIDIA NIM, a part of the NVIDIA AI Enterprise software program platform accessible on the OCI Market, is a set of easy-to-use microservices designed for safe, dependable deployment of high-performance AI mannequin inference to deploy world-class generative AI functions.
Optimized for NVIDIA GPUs, NIM pre-built containers supply builders improved price of possession, sooner time to market and safety. NIM microservices for fashionable neighborhood fashions, discovered on the NVIDIA API Catalog, may be deployed simply on OCI.
Efficiency will proceed to enhance over time with upcoming GPU-accelerated situations, together with NVIDIA H200 Tensor Core GPUs and NVIDIA Blackwell GPUs.
Order the L40S GPU and take a look at the GH200 Superchip by reaching out to OCI. To be taught extra, be part of Oracle and NVIDIA at SIGGRAPH, the world’s premier graphics convention, operating by Aug. 1.
See discover concerning software program product info.