Master's research: Link-based search for similar pages on the web
[pdf]
[ps]
[ppt]
How to identify similar web pages is a crucial task for search engine. Traditionally, similarity is computed based on the contents of
web pages. Recent research shows that the link structure might also be exploited for the similarity task. Those algorithms proposed
based on this idea are called graph-based algorithms. In this project, we present several graph-based algorithms to identify the
relevant web pages. For comparison, we also implemented a previous proposed graph-based algorithm, Dean and Henzinger's Companion
algorithm, and TFIDF ( Term Frequency X Inverse Document Frequency) which is one of the prevalent content-based algorithms.
Our experiments were performed on the .gov data set, which is a filtered crawl of the .gov domain that was prepared for the Web track of
TREC. To evaluate the effectiveness of our algorithms, we performed an automatic comparison and a user study. Our study shows that our
algorithms perform either better than or competitive with the Companion method. Furthermore, the result sets from the graph-based
algorithms are fairly different from that of the TFIDF.
Current research: statistical analysis of dynamic graphs
[pdf]
[more]
[memo]
Communications between large numbers of individuals can be modeled as dynamic
graphs. Information on communication patterns, roles of vertices and anomalies is
believed to hide in these linkage graphs, but how to extract it is a question puzzles
researchers for a long time. In this project, a set of local measures, which are mostly
defined upon the neighbourhoods of vertices, are introduced in the hope of capturing
those hidden information. These measures are applied to an email dataset, and the set
of signals generated are investigated using scan statistics in order to identify anomaly
or change of communication pattern. Results show these measures have the potential
to identify communication changes of different types.
Selected Articles:
- Xiaomeng Wan, Jeannette Janssen, Nauzer Kalyaniwalla and
Evangelos Milios, Statistical analysis of dynamic graphs,
Proceedings of AISB06: Adaptation in Artificial and Biological Systems, v3, pp.176-179, 2006.
[pdf]
- Xiaomeng Wan, Nauzer Kalyaniwalla, Capturing causality in communications graphs, DIMACS/DyDAn Workshop on Computational Methods for Dynamic Interaction Networks, 2007.
[pdf]
- Xiaomeng Wan, Evangelos Milios, Nauzer Kalyaniwalla, Jeannette Janssen, Link-based Anomaly
Detection in Communication Networks, 2008 IEEE/WIC/ACM International Workshop on Computational Social Networks (IWCSN 2008), Dec 9-12,
2008, Sydney, Australia.
- Xiaomeng Wan, Evangelos Milios, Nauzer Kalyaniwalla, Jeannette Janssen, Link-based Event Detection in
Email Communication Networks, 24th Annual ACM Symposium on Applied Computing (SAC 2009), Mar 8-12, 2009, Hawaii, USA.
Blog