Article ID Journal Published Year Pages File Type
391775 Information Sciences 2014 16 Pages PDF
Abstract

•An algorithm for detecting ε-approximate frequent stream data items is presented.•The algorithm uses much less memory space than other methods.•Its memory requirement is independent of the length of the stream.•The algorithm outperforms other methods in terms of accuracy and speed.•The advantages of the algorithm are proved theoretically and empirically.

We investigate the problem of finding frequent items in a continuous data stream, and present an algorithm named λ-HCount for computing frequency counts of stream data based on a time fading model. The algorithm uses r hash functions to estimate the density values of stream data items. To emphasize the importance of recent data items, a time fading factor is used. For a given error bound, our algorithm can detect approximate frequent items under a certain probability using limited number of memory space. The memory requirement only depends on the number of different data items and the number of hash functions used. Experimental results on synthetic and real data sets show that our algorithm outperforms other methods in terms of accuracy, memory requirement, and processing speed.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,