Xiaomeng's Photo

 
 
    Xiaomeng Wan
      Ph.D. Student
      Faculty of Computer Science     Phone: (O)902-4946455
      Dalhousie University     Email: xwan AT cs AT dal AT ca
      Supervisors:     Evangelos Milios
    Jeannette Janssen
    Nauzer Kalyaniwalla
      Research groups:     MoMiNIS LITHEL
 
 


Master's research: Link-based search for similar pages on the web [pdf] [ps] [ppt]
How to identify similar web pages is a crucial task for search engine. Traditionally, similarity is computed based on the contents of web pages. Recent research shows that the link structure might also be exploited for the similarity task. Those algorithms proposed based on this idea are called graph-based algorithms. In this project, we present several graph-based algorithms to identify the relevant web pages. For comparison, we also implemented a previous proposed graph-based algorithm, Dean and Henzinger's Companion algorithm, and TFIDF ( Term Frequency X Inverse Document Frequency) which is one of the prevalent content-based algorithms.
Our experiments were performed on the .gov data set, which is a filtered crawl of the .gov domain that was prepared for the Web track of TREC. To evaluate the effectiveness of our algorithms, we performed an automatic comparison and a user study. Our study shows that our algorithms perform either better than or competitive with the Companion method. Furthermore, the result sets from the graph-based algorithms are fairly different from that of the TFIDF.

Current research: statistical analysis of dynamic graphs [pdf] [more] [memo]
Communications between large numbers of individuals can be modeled as dynamic graphs. Information on communication patterns, roles of vertices and anomalies is believed to hide in these linkage graphs, but how to extract it is a question puzzles researchers for a long time. In this project, a set of local measures, which are mostly defined upon the neighbourhoods of vertices, are introduced in the hope of capturing those hidden information. These measures are applied to an email dataset, and the set of signals generated are investigated using scan statistics in order to identify anomaly or change of communication pattern. Results show these measures have the potential to identify communication changes of different types.



    Selected Articles:



Blog