Technical Report

Report Title: 

Wikipedia Search: Combining Language Modeling and Link Analysis

Authors: 

Jacek Wolkowicz, Michael Shepherd and Vlado Keselj

Tech Report Number: 

CS-2009-01

Report Date: 
January 29th, 2009
Abstract: 

As a result of its recent rapid development, Wikipedia is likely the largest resource of organized content, manually edited through a collaborative effort by millions of "wikipedians" The complex structure of Wikipedia with many features that normal corpora do not have, such as semantically rich inter-document links, allows for successful application of new approaches to information retrieval. The contribution of this paper is twofold. First we analyze Wikipedia.s usefulness as a research corpus for modern IR techniques. The second contribution is that we present a novel Wikipedia retrieval method which combines link analysis and language modeling with different weights for better ranking of the result set. The experimental results demonstrate statistically significant improvements in both precision (5%) and recall (8%) when compared to language modeling or link analysis based techniques separately.

Author Addresses: 

Faculty of Computer Science
6050 University Ave
Halifax, NS B3H 1W5

Report Files