gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling (Extended Abstract)

gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling (Extended Abstract)

Aleksandr V. Petrov, Craig Macdonald

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Sister Conferences Best Papers. Pages 8447-8449. https://doi.org/10.24963/ijcai.2024/939

Sequential recommendation models predict the next item in a sequence of user-item interactions, akin to how language models predict the next tokens. These models often adapt language model architectures, treating item IDs as if they were token IDs. However, the number of potential items in recommender systems makes calculating the interaction probability for all items impractical during training; therefore, recommender systems frequently employ negative sampling, where the model learns to differentiate between actual user interactions (positives) and randomly chosen non-interactions (negatives), often using Binary Cross-Entropy (BCE) the loss, framing the problem as a binary classification task. We demonstrate that using negative sampling with BCE can lead to overconfidence, where the model's predicted probabilities for user interactions are higher than the actual probabilities. Although the actual score magnitude is not important for ranking items (only the order of scores matters), overconfidence leads to training instability when using Binary Cross-Entropy (BCE) loss. We show that overconfidence explains the performance gap between two leading sequential recommendation models, SASRec and BERT4Rec -- the former uses negative sampling, while the latter does not. To counter overconfidence, we introduce Generalised Binary Cross-Entropy (gBCE) loss and the gSASRec model that utilises gBCE. We mathematically prove and empirically validate that gSASRec effectively addresses the issue of overconfidence. Consequently, gSASRec's effectiveness is better than that of SASRec and matches the state of the BERT4Rec while retaining negative sampling. On the Gowalla dataset with more than 1MM items, where training BERT4Rec is infeasible, gSASRec outperforms the original SASRec model by 41% in terms of NDCG@10.
Keywords:
Data Mining: DM: Recommender systems
Machine Learning: ML: Probabilistic machine learning
Data Mining: DM: Information retrieval
Machine Learning: ML: Sequence and graph learning