Mining frequent items in data stream using time fading model

Article ID	Journal	Published Year	Pages	File Type
391775	Information Sciences	2014	16 Pages	PDF

Abstract

•An algorithm for detecting ε-approximate frequent stream data items is presented.•The algorithm uses much less memory space than other methods.•Its memory requirement is independent of the length of the stream.•The algorithm outperforms other methods in terms of accuracy and speed.•The advantages of the algorithm are proved theoretically and empirically.

We investigate the problem of finding frequent items in a continuous data stream, and present an algorithm named λ-HCount for computing frequency counts of stream data based on a time fading model. The algorithm uses r hash functions to estimate the density values of stream data items. To emphasize the importance of recent data items, a time fading factor is used. For a given error bound, our algorithm can detect approximate frequent items under a certain probability using limited number of memory space. The memory requirement only depends on the number of different data items and the number of hash functions used. Experimental results on synthetic and real data sets show that our algorithm outperforms other methods in terms of accuracy, memory requirement, and processing speed.

Keywords

hash function Stream data mining