Natural Search Queries - a machine learning approach to a search interface for consumer-oriented databases

Authors: 

Marek Lipczak
Evangelos Milios

Author Addresses: 

Faculty of Computer Science
Dalhousie University
6050 University Ave.
PO Box 15000
Halifax, Nova Scotia, Canada
B3H 4R2

Abstract: 

Search in consumer-oriented databases is becoming increasingly important, as the computer becomes a commonly used tool. Such databases are at the heart of e-mail managers, flight booking, and other e-commerce systems. Key problems associated with such searches are the structure and interface of the search query. The traditional solution for these problems involves the use of a separate text field for each element of the query structure. However, the requirement to support ever increasing numbers of inexperienced users, who require an efficient and userfriendly interface, is not met by the traditional solution. We present Natural Search Queries (NSQ), a simple and intuitive approach to the search of structured information. Our solution combines the ideas of natural language database interfaces and operator based search; queries, in simplified and intuitive natural language, are entered into a single text field. It is a front-end search interface oriented towards the common user. Our aim is to allow as much freedom in formulating queries as possible, while interpreting such queries as accurately as possible, to automatically extract the elements of the query structure. In our project, we address the problem of e-mail databases, but the results may be applicable to other databases oriented towards consumer users. The report introduces the grammar of Natural Search Queries and probabilistic methods for recognizing the query structure (i.e., parsing, and Hidden Markov Model). In addition, we demonstrate a complete implementation of a system for processing NSQs and presenting retrieved messages. Specific subproblems that were addressed are the deterministic recognition of natural date constraints, and training of the models for query structure recognition. Tests show promising results in processing a broad range of Natural Search Queries.

Tech Report Number: 
CS-2007-01
Report Date: 
January 27, 2007
AttachmentSize
PDF icon CS-2007-01.pdf1.15 MB