Topics covered in CSCI 4141
Prerequisites:
Note
Although the prerequisite courses listed above are only at the second year
level, this course does require that the student have a fair amount of
computing "maturity". Students without this "maturity" often have a great
deal of trouble completing the project, which is worth 30% of the final mark.
This is an overview course in Information Retrieval. Some or all of the
following topics will be covered.
Models
- Boolean Model
- Vector Space Model
- Relational DBMS
- SGML
- Hypertext
- features etc.
- Dexter Reference Model
Term Indexing
- Zipf's Law
- term inverse document frequency weight
- term discrimination value
Searching and Data Structures
- Inverted files to support Boolean and Vector Models
- Nearest Neighbour Searching
- Clustering
- non-hierarchical
- hierarchical agglomerative
- String Searching
- Tries, binary tries, binary digital tries, patricia trees, suffix trees, etc.
Retrieval Effectiveness Evaluation
- Recall, Precision, Fallout
- Comparing systems using average precision