Verboseness Fission for BM25 Document Length Normalization

Authors: 
Aldo Lipani
Mihai Lupu
Allan Hanbury
Akiko Aizawa
Type: 
Proceedings contribution
Proceedings: 
ICTIR '15 Proceedings of the 2015 International Conference on The Theory of Information Retrieval
Publisher: 
ACM
Pages: 
385 - 388
ISBN: 
ISBN: 978-1-4503-3833-2
Year: 
2015
Abstract: 
BM25 is probably the most well known term weighting model in Information Retrieval. It has, depending on the formula variant at hand, 2 or 3 parameters (k1, b, and k3). This paper addresses b - the document length normalization parameter. Based on the observation that the two cases previously discussed for length normalization (multi-topicality and verboseness) are actually three: multi-topicality, verboseness with word repetition (repetitiveness) and verboseness with synonyms, we propose and test a new length normalization method that removes the need for a b parameter in BM25. Testing the new method on a set of purposefully varied test collections, we observe that we can obtain results statistically indistinguishable from the optimal results, therefore removing the need for ground-truth based optimization.
TU Focus: 
Computational Science and Engineering
Reference: 

A. Lipani, M. Lupu, A. Hanbury, A. Aizawa:
"Verboseness Fission for BM25 Document Length Normalization";
in: "ICTIR '15 Proceedings of the 2015 International Conference on The Theory of Information Retrieval", ACM, 2015, ISBN: 978-1-4503-3833-2, S. 385 - 388.

Zusätzliche Informationen

Last changed: 
21.12.2015 18:32:38
TU Id: 
244472
Accepted: 
Accepted
Invited: 
Department Focus: 
Business Informatics
Abstract German: 
Author List: 
A. Lipani, M. Lupu, A. Hanbury, A. Aizawa