Topics
- Term distribution, term weighting, feature selection
- term distribution - Zipf's Law
- tf.idf
- Feature Set Selection
- cluster based measure
- Information Gain Measure
- Unsupervised concept identification
- feature set reduction based on clustering
- Latent Semantic Indexing
- Models
- Boolean Model
- Vector Space Model (VSM)
- Vector Space Model - binary weights
- Vector Space Model - non-binary weights
- cosine similarity measure
- Rocchio's Feedback Method
- Probabilistic Model
- Language Models
- File structures
- Inverted file structures for Boolean and VSM
- Evaluation of Effectiveness
- Recall, Precision and Fallout
- Average Precision and Recall
- F-measure and E-measure
- Normalized Recall
- Clustering
- Cluster Hypothesis
- Retrieval using clusters, More-like-this-one, Scatter-Gather
- Partitioning
- Single Pass Algorithm
- k-means algorithm
- The Davies-Bouldin Index for evaluation of clustering structure
- Hierarchical
- Bottom up or Agglomerative
- Minimum Spanning Tree
- Prim-Dijkstra Algorithm
- Λ Measure for evaluation of clustering structure
- Index Structures
- TRIEs
- PATRICIA Trees
- Suffix Tries and Suffix Trees
- Word Signatures and Bloom Filters
- String searching (Aho-Corasick)
- Social Network Analysis
- Prestige measure
- Co-citation networks
- PageRank Algorithm
- HITS Algorithm
- Research Talks (see powerpoint slides under readings)
- Challenges in Information Retrieval
- Genre and Task
- Tacit Knowledge