Wikipedia Search: Combining Language Modeling and Link Analysis


Jacek Wolkowicz
Michael Shepherd
Vlado Keselj

Author Addresses: 

Faculty of Computer Science
Dalhousie University
6050 University Ave.
PO Box 15000
Halifax, Nova Scotia, Canada
B3H 4R2


As a result of its recent rapid development, Wikipedia is likely the largest resource of organized content, manually edited through a collaborative effort by millions of "wikipedians" The complex structure of Wikipedia with many features that normal corpora do not have, such as semantically rich inter-document links, allows for successful application of new approaches to information retrieval. The contribution of this paper is twofold. First we analyze Wikipedia.s usefulness as a research corpus for modern IR techniques. The second contribution is that we present a novel Wikipedia retrieval method which combines link analysis and language modeling with different weights for better ranking of the result set. The experimental results demonstrate statistically significant improvements in both precision (5%) and recall (8%) when compared to language modeling or link analysis based techniques separately.

Tech Report Number: 
Report Date: 
January 29, 2009
PDF icon CS-2009-01.pdf1 MB