A team of machine learning and AI experts from Nvidia competed in the 2023 KDD Cup, an annual competition organized by ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). The ML and AI competition spanned three months and involved a trio of tasks, held in stages. The competition was fierce, although when the final scores were revealed, the Nvidia group had swept all three stages of the contest.
Perseverance. That’s what it took, along with skill, state-of-the-art technology, and technical know-how for Team Nvidia-Merlin to win all three stages and take first place at the recent Amazon KDD Cup ’23, a three-month challenge in which machine learning experts from around the globe competed to build recommendation systems and then put them to the ultimate test.
The Nvidia-Merlin team name pays homage to Nvidia Merlin, a framework to help users quickly build their own recommendation systems. The group—comprising five ML experts living in different cities and time zones, from Berlin to Tokyo—competed against more than 450 teams of other data scientists.
The KDD Cup is an annual data mining and knowledge discovery competition organized by ACM Special Interest Group on Knowledge Discovery and Data Mining. It is held in conjunction with the ACM SIGKDD conference, which this year will take place August 6 to 10, 2023, in Long Beach, California. This year’s contest marks the 27th year for the KDD Cup.
The competition kicked off March 15, 2023, and for three months, industry and academic teams as well as nonprofit organizations worked to solve a novel problem/task posed by ACM. This year’s competition focused on improving session-based recommendations. In particular, it centered around the so-called Shopping Session Dataset, a multilingual dataset of user shopping sessions from Amazon, with the goal of encouraging innovation and diversity in recommendation systems. A recommendation system (or recommender system) is a class of machine learning that uses data to help predict, narrow down, and find what people are looking for among an exponentially growing number of options.
The dataset comprised millions of user sessions from six locales where the major languages of products were English, German, Japanese, French, Italian, and Spanish, with the first three languages being far more prevalent than the last three. Teams were then challenged to accomplish three tasks:
- Predict the next engaged product for sessions from English, German, and Japanese.
- Predict the next engaged product for sessions from French, Italian, and Spanish, where transfer learning techniques were encouraged.
- Predict the title for the next engaged product.
The main objective of this competition was to build advanced session-based algorithms/models that directly predict the next engaged product or generate its title text.
For a significant part of the competition, Nvidia-Merlin held a comfortable lead, but during the final phase, new test datasets were introduced, and other teams began surpassing the group. In that last challenge, participants had to predict which products users would buy based on their browsing. But Nvidia-Merlin had a secret weapon or two. Well, actually four, as four members are ranked as grandmasters in Kaggle competitions, described as the online Olympics of data science.
According to team member Chris Deotte, one of those grandmasters, the group began working nonstop. Initially, their efforts to use LLMs to build generative AI models to predict product names failed, but with only a few hours left in the competition, they shifted tactics using a new hybrid ranking/classifier model. In addition to Deotte, the team’s other Kaggle grandmasters include Gilberto “Giba” Titericz (Brazil), Kazuki Onodera (Japan), and Jean-Francois Puget (France). They were joined by Benedikt Schifferer, a Berlin-based teammate who helps design Merlin, which the group used, along with RAPIDS and more.
Each task had a separate leaderboard that was maintained throughout the competition for models evaluated on the public test set. At the end of the competition, a private leaderboard was maintained for models evaluated on the private test set. This latter leaderboard was then used to decide the winners for each task in the competition.
In the end, Nvidia swept all three tasks, winning the coveted Amazon KDD Cup for 2023.
There are prizes for each of the three tasks: First place—$4,000; second place: $2,000; third place—$1,000. Teams that finish in the fourth through 10th positions receive AWS credits worth $500.