Verboseness Fission for BM25 Document Length Normalization
Type:
Proceedings contribution
Proceedings:
ICTIR '15 Proceedings of the 2015 International Conference on The Theory of Information Retrieval
Publisher:
ACM
Pages:
385 - 388
ISBN:
ISBN: 978-1-4503-3833-2
Year:
2015
Abstract:
BM25 is probably the most well known term weighting model in Information Retrieval. It has, depending on the formula variant at hand, 2 or 3 parameters (k1, b, and k3). This paper addresses b - the document length normalization parameter. Based on the observation that the two cases previously discussed for length normalization (multi-topicality and verboseness) are actually three: multi-topicality, verboseness with word repetition (repetitiveness) and verboseness with synonyms, we propose and test a new length normalization method that removes the need for a b parameter in BM25. Testing the new method on a set of purposefully varied test collections, we observe that we can obtain results statistically indistinguishable from the optimal results, therefore removing the need for ground-truth based optimization.
TU Focus:
Computational Science and Engineering
Reference:
A. Lipani, M. Lupu, A. Hanbury, A. Aizawa:
"Verboseness Fission for BM25 Document Length Normalization";
in: "ICTIR '15 Proceedings of the 2015 International Conference on The Theory of Information Retrieval", ACM, 2015, ISBN: 978-1-4503-3833-2, S. 385 - 388.
Zusätzliche Informationen
Last changed:
21.12.2015 18:32:38
TU Id:
244472
Accepted:
Accepted
Invited:
Department Focus:
Business Informatics
Info Link:
https://publik.tuwien.ac.at/showentry.php?ID=244472&lang=1
Abstract German: