The ultimate days of the AI Mathematical Olympiad’s newest competitors had been a transcontinental relay for group NVIDIA.
Each night, two group members on reverse ends of the U.S. would submit an AI reasoning mannequin to Kaggle — the web Olympics of knowledge science and machine studying. They’d wait a tense 5 hours earlier than studying how nicely the mannequin tackled a pattern set of fifty advanced math issues.
After seeing the outcomes, the U.S. group would cross the baton to teammates waking up in Armenia, Finland, Germany and Northern Eire, who would spend their day testing, modifying and optimizing totally different mannequin variations.
“Each evening I’d be so disenchanted in our rating, however then I’d get up and see the messages that got here in in a single day from teammates in Europe,” mentioned Igor Gitman, senior utilized scientist. “My hopes would go up and we’d strive once more.”
Whereas the group was disheartened by their lack of enchancment on the general public dataset through the competitors’s remaining days, the true check of an AI mannequin is how nicely it could actually generalize to unseen knowledge. That’s the place their reasoning mannequin leapt to the highest of the leaderboard — accurately answering 34 out of fifty Olympiad questions inside a five-hour time restrict utilizing a cluster of 4 NVIDIA L4 GPUs.
“We obtained the magic ultimately,” mentioned Northern Eire-based group member Darragh Hanley, a Kaggle grandmaster and senior giant language mannequin (LLM) technologist.
Constructing a Profitable Equation
The NVIDIA group competed underneath the title NemoSkills — a nod to their use of the NeMo-Abilities assortment of pipelines for accelerated LLM coaching, analysis and inference. The seven members every contributed totally different areas of experience, spanning LLM coaching, mannequin distillation and inference optimization.
For the Kaggle problem, over 2,200 taking part groups submitted AI fashions tasked with fixing 50 math questions — advanced issues on the Nationwide Olympiad degree, spanning algebra, geometry, combinatorics and quantity principle — inside 5 hours.
The group’s successful mannequin makes use of a mix of pure language reasoning and Python code execution.
To finish this inference problem on the small cluster of NVIDIA L4 GPUs obtainable through Kaggle, the NemoSkills group needed to get inventive.
Their successful mannequin used Qwen2.5-14B-Base, a basis mannequin with chain-of-thought reasoning capabilities which the group fine-tuned on hundreds of thousands of synthetically generated options to math issues.
These artificial options had been primarily generated by two bigger reasoning fashions — DeepSeek-R1 and QwQ-32B — and used to show the group’s basis mannequin through a type of information distillation. The top outcome was a smaller, sooner, long-thinking mannequin able to tackling advanced issues utilizing a mix of pure language reasoning and Python code execution.
To additional increase efficiency, the group’s answer causes by a number of long-thinking responses in parallel earlier than figuring out a remaining reply. To optimize this course of and meet the competitors’s time restrict, the group additionally used an revolutionary early-stopping approach.
A reasoning mannequin would possibly, for instance, be set to reply a math downside 12 totally different instances earlier than selecting the commonest response. Utilizing the asynchronous processing capabilities of NeMo-Abilities and NVIDIA TensorRT-LLM, the group was capable of monitor and exit inference early if the mannequin had already converged on the right reply 4 or extra instances.
TensorRT-LLM additionally enabled the group to harness FP8 quantization, a compression methodology that resulted in a 1.5x speedup over utilizing the extra generally used FP16 format. ReDrafter, a speculative decoding approach developed by Apple, was used for an additional 1.8x speedup.
The ultimate mannequin carried out even higher on the competitors’s unseen remaining dataset than it did on the general public dataset — an indication that the group efficiently constructed a generalizable mannequin and prevented overfitting their LLM to the pattern knowledge.
“Even with out the Kaggle competitors, we’d nonetheless be working to enhance AI reasoning fashions for math,” mentioned Gitman. “However Kaggle offers us the chance to benchmark and uncover how nicely our fashions generalize to a third-party dataset.”
Sharing the Wealth
The group will quickly launch a technical report detailing the methods used of their successful answer — and plans to share their dataset and a collection of fashions on Hugging Face. The developments and optimizations they remodeled the course of the competitors have been built-in into NeMo-Abilities pipelines obtainable on GitHub.
Key knowledge, know-how, and insights from this pipeline had been additionally used to coach the just-released NVIDIA Llama Nemotron Extremely mannequin.
“All through this collaboration, we used instruments throughout the NVIDIA software program stack,” mentioned Christof Henkel, a member of the Kaggle Grandmasters of NVIDIA, generally known as KGMON. “By working intently with our LLM analysis and growth groups, we’re capable of take what we be taught from the competitors on a day-to-day foundation and push these optimizations into NVIDIA’s open-source libraries.”
After the competitors win, Henkel regained the title of Kaggle World Champion — rating No. 1 among the many platform’s over 23 million customers. One other teammate, Finland-based Ivan Sorokin, earned the Kaggle Grandmaster title, held by simply over 350 folks around the globe.
For his or her first-place win, the group additionally gained a $262,144 prize that they’re directing to the NVIDIA Basis to assist charitable organizations.
Meet the complete group — Igor Gitman, Darragh Hanley, Christof Henkel, Ivan Moshkov, Benedikt Schifferer, Ivan Sorokin and Shubham Toshniwal — within the video beneath:
Pattern math questions within the featured visible above are from the 2025 American Invitational Arithmetic Examination. Discover the complete set of questions and options on the Artwork of Drawback Fixing wiki.