Sanjay Baberwal & Ben Choi

Computer Science, College of Engineering and Science
Louisiana Tech University, Ruston, LA 71272, USA


Abstract: In the current information age, the dominant method for information search is by providing few keywords to a search engine. Keyword search is currently one of the most important operations in search engines and numerous other applications. In this paper we propose a new text indexing technique for improving the performance of keyword search. Our proposed technique not only speeds up searching operations but also the operations for inserting and for deleting keywords, which are particularly important for the ever increasing and dynamic changing databases such as that for search engines. We propose to partition all keywords into search trees based on the first character and the length of the keywords. Our partitioning scheme creates a much more even distribution of keywords and results in a 32% speedup in the worst cases and a 1% speedup in the average cases in comparing to one of the leading text indexing techniques called burst tries. In addition, our proposed technique stores document indexes only at the leaf nodes of the search trees and results in efficient algorithms for searching, insertion, and deletion of keywords. We successfully integrated the technique into our Information Classification and Search Engine system and showed its potential and feasibility.



The 3rd IASTED International Conference on COMMUNICATIONS, INTERNET, AND INFORMATION TECHNOLOGY, pp. 255-260, 2004.