Building Inverted Files Through Efficient Dynamic Hashing

Chuleerat Jaruskulchai and Canasai Kruengkrai

Abstract

We consider the problem of using an inversion algorithm to build inverted files from large document collections. In particular, we focus on applying a hash table to represent the dynamic dictionary data structure. The dictionary is used to store all distinct terms in the document collection during indexing process. It is the challenge to efficiently construct the hash table, since we do not know in advance the table size. In this paper, we introduce an efficient algorithm for building an inverted file based on the combination of the sort-based inversion algorithm and the cuckoo hashing. Our approach is to exploit the cuckoo hashing scheme that guarantees linear space used and worst case constant look up time. The experimental results show that our algorithm performs well on two large datasets.

Download: pdf, ps


Canasai Kruengkrai