Technical Report
Wikipedia Search: Combining Language Modeling and Link Analysis
Jacek Wolkowicz, Michael Shepherd and Vlado Keselj
CS-2009-01
As a result of its recent rapid development, Wikipedia is likely the largest resource of organized content, manually edited through a collaborative effort by millions of "wikipedians" The complex structure of Wikipedia with many features that normal corpora do not have, such as semantically rich inter-document links, allows for successful application of new approaches to information retrieval. The contribution of this paper is twofold. First we analyze Wikipedia.s usefulness as a research corpus for modern IR techniques. The second contribution is that we present a novel Wikipedia retrieval method which combines link analysis and language modeling with different weights for better ranking of the result set. The experimental results demonstrate statistically significant improvements in both precision (5%) and recall (8%) when compared to language modeling or link analysis based techniques separately.
Faculty of Computer Science
6050 University Ave
Halifax, NS B3H 1W5
