NVIDIA Blackwell Delivers Subsequent-Stage MLPerf Coaching Efficiency

NVIDIA Blackwell Delivers Subsequent-Stage MLPerf Coaching Efficiency

Generative AI functions that use textual content, pc code, protein chains, summaries, video and even 3D graphics require data-center-scale accelerated computing to effectively practice the massive language fashions (LLMs) that energy them.

In MLPerf Coaching 4.1 {industry} benchmarks, the NVIDIA Blackwell platform delivered spectacular outcomes on workloads throughout all assessments — and as much as 2.2x extra efficiency per GPU on LLM benchmarks, together with Llama 2 70B fine-tuning and GPT-3 175B pretraining.

As well as, NVIDIA’s submissions on the NVIDIA Hopper platform continued to carry at-scale information on all benchmarks, together with a submission with 11,616 Hopper GPUs on the GPT-3 175B benchmark.

Leaps and Bounds With Blackwell

The primary Blackwell coaching submission to the MLCommons Consortium — which creates standardized, unbiased and rigorously peer-reviewed testing for {industry} contributors — highlights how the structure is advancing generative AI coaching efficiency.

As an illustration, the structure contains new kernels that make extra environment friendly use of Tensor Cores. Kernels are optimized, purpose-built math operations like matrix-multiplies which can be on the coronary heart of many deep studying algorithms.

Blackwell’s increased per-GPU compute throughput and considerably bigger and sooner high-bandwidth reminiscence permits it to run the GPT-3 175B benchmark on fewer GPUs whereas reaching glorious per-GPU efficiency.

Making the most of bigger, higher-bandwidth HBM3e reminiscence, simply 64 Blackwell GPUs have been in a position to run within the GPT-3 LLM benchmark with out compromising per-GPU efficiency. The identical benchmark run utilizing Hopper wanted 256 GPUs.

The Blackwell coaching outcomes comply with an earlier submission to MLPerf Inference 4.1, the place Blackwell delivered as much as 4x extra LLM inference efficiency versus the Hopper technology. Making the most of the Blackwell structure’s FP4 precision, together with the NVIDIA QUASAR Quantization System, the submission revealed highly effective efficiency whereas assembly the benchmark’s accuracy necessities.

Relentless Optimization

NVIDIA platforms endure steady software program improvement, racking up efficiency and have enhancements in coaching and inference for all kinds of frameworks, fashions and functions.

On this spherical of MLPerf coaching submissions, Hopper delivered a 1.3x enchancment on GPT-3 175B per-GPU coaching efficiency because the introduction of the benchmark.

NVIDIA additionally submitted large-scale outcomes on the GPT-3 175B benchmark utilizing 11,616 Hopper GPUs linked with NVIDIA NVLink and NVSwitch high-bandwidth GPU-to-GPU communication and NVIDIA Quantum-2 InfiniBand networking.

NVIDIA Hopper GPUs have greater than tripled scale and efficiency on the GPT-3 175B benchmark since final 12 months. As well as, on the Llama 2 70B LoRA fine-tuning benchmark, NVIDIA elevated efficiency by 26% utilizing the identical variety of Hopper GPUs, reflecting continued software program enhancements.

NVIDIA’s ongoing work on optimizing its accelerated computing platforms permits continued enhancements in MLPerf check outcomes — driving efficiency up in containerized software program, bringing extra highly effective computing to companions and prospects on present platforms and delivering extra return on their platform funding.

Partnering Up

NVIDIA companions, together with system makers and cloud service suppliers like ASUSTek, Azure, Cisco, Dell, Fujitsu, Giga Computing, Lambda Labs, Lenovo, Oracle Cloud, Quanta Cloud Know-how and Supermicro additionally submitted spectacular outcomes to MLPerf on this newest spherical.

A founding member of MLCommons, NVIDIA sees the function of industry-standard benchmarks and benchmarking greatest practices in AI computing as very important. With entry to peer-reviewed, streamlined comparisons of AI and HPC platforms, corporations can hold tempo with the most recent AI computing improvements and entry essential knowledge that may assist information necessary platform funding selections.

Study extra concerning the newest MLPerf outcomes on the NVIDIA Technical Weblog