Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
429351 | Journal of Computational Science | 2015 | 11 Pages |
•We built a system for event detection and trending from tweet clusters which are discovered using locality sensitive hashing (LSH) technique.•Construction of feature vectors in high dimensional dataset.•Leveraging cluster-discovery using locality sensitive hashing to find truly interested events and record their attributes in MySQL database.•Trending the event behavior over time, geo-locations and cluster size.
Social media data carries abundant hidden occurrences of real-time events. In this paper, a novel methodology is proposed for detecting and trending events from tweet clusters that are discovered by using locality sensitive hashing (LSH) technique. Key challenges include: (1) construction of dictionary using incremental term frequency–inverse document frequency (TF–IDF) in high-dimensional data to create tweet feature vector, (2) leveraging LSH to find truly interesting events, (3) trending the behavior of event based on time, geo-locations and cluster size, and (4) speed-up the cluster-discovery process while retaining the cluster quality. Experiments are conducted for a specific event and the clusters discovered using LSH and K-means are compared with group average agglomerative clustering technique.