Exploration of a Threshold for Similarity based on Uncertainty in Word Embedding

Authors: 
Navid Rekabsaz
Mihai Lupu
Allan Hanbury
Guido Zuccon
Type: 
Speech with proceedings
Proceedings: 
Advances in Information Retrieval
Publisher: 
Springer, Cham
Pages: 
396 - 409
ISBN: 
ISBN: 978-3-319-56607-8
Year: 
2017
Abstract: 
Word embedding promises a quantification of the similarity between terms. However, it is not clear to what extent this similarity value can be of practical use for subsequent information access tasks. In particular, which range of similarity values is indicative of the actual term relatedness? We first observe and quantify the uncertainty of word embedding models with respect to the similarity values they generate. Based on this, we introduce a general threshold which effectively filters related terms. We explore the effect of dimensionality on this general threshold by conducting the experiments in different vector dimensions. Our evaluation on four test collections with four relevance scoring models supports the effectiveness of our approach, as the results of the proposed threshold are significantly better than the baseline while being equal to, or statistically indistinguishable from, the optimal results.
TU Focus: 
Information and Communication Technology
Reference: 

N. Rekabsaz, M. Lupu, A. Hanbury, G. Zuccon:
"Exploration of a Threshold for Similarity based on Uncertainty in Word Embedding";
Vortrag: European Conference on IR Research (ECIR), Aberdeen, UK; 08.04.2017 - 13.04.2017; in: "Advances in Information Retrieval", Springer, Cham, 10193 (2017), ISBN: 978-3-319-56607-8; S. 396 - 409.

Zusätzliche Informationen

Last changed: 
10.12.2017 10:18:03
TU Id: 
263993
Accepted: 
Accepted
Invited: 
Department Focus: 
Business Informatics
Abstract German: 
Author List: 
N. Rekabsaz, M. Lupu, A. Hanbury, G. Zuccon