کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
402277 | 676892 | 2015 | 16 صفحه PDF | دانلود رایگان |

Search engine query logs are recognized as an important information source that contains millions of users’ web search needs. Discovering Geographic Web Search Topics (G-WSTs) from a query log can support a variety of downstream web applications such as finding commonality between locations and profiling search engine users. However, the task of discovering G-WSTs is nontrivial, not only because of the diversity of the information in web search but also due to the sheer size of query log. In this paper, we propose a new framework, Scalable Geographic Web Search Topic Discovery (SG-WSTD), which contains highly scalable functionalities such as search session derivation, geographic information extraction and geographic web search topic discovery to discover G-WSTs from query log. Within SG-WSTD, two probabilistic topic models are proposed to discover G-WSTs from two complementary perspectives. The first one is the Discrete Search Topic Model (DSTM), which discovers G-WSTs that capture the commonalities between discrete locations. The second is the Regional Search Topic Model (RSTM), which focuses on a specific geographic region on the map and discovers G-WSTs that demonstrate geographic locality. Since query log is typically voluminous, we implement the functionalities in SG-WSTD based on the MapReduce paradigm to solve the efficiency bottleneck. We evaluate SG-WSTD against several strong baselines on a real-life query log from AOL. The proposed framework demonstrates significantly improved data interpretability, better prediction performance, higher topic distinctiveness and superior scalability in the experimentation.
Journal: Knowledge-Based Systems - Volume 84, August 2015, Pages 18–33